ComfyUI Course - Learn ComfyUI From Scratch | Full 5 Hour Course (Ep01)
By pixaroma
Summary
Topics Covered
- Visual Thinkers Master Comfy UI
- Latent Space Speeds Diffusion
- Diffusion Removes Noise Step-by-Step
- Node-Based UI Reveals AI Process
- Portable Install Avoids System Conflicts
Full Transcript
Learning Comfy UI is like opening a technical book at the last page.
Everything is there, but nothing makes sense yet. This course starts from page
sense yet. This course starts from page one. Before we go any further, I want to
one. Before we go any further, I want to be very clear about how this course works. This is not a shortcuts course.
works. This is not a shortcuts course.
It is not about copying workflows without understanding them. Each chapter
builds on the previous one. We start
simple, repeat the important ideas and only add complexity when it actually makes sense. You do not need any coding
makes sense. You do not need any coding knowledge. You do not need to be
knowledge. You do not need to be technical. If you can think visually,
technical. If you can think visually, you can learn Comfy UI. If you want to understand how AI image generation really works locally and how to use Comfy UI without feeling lost, this
course is for you. My name is Pixaroma and on this channel I focus on creating and teaching comfy UI workflows in a simple and practical way. I am a graphic
designer not a programmer and that is actually a good thing. Developers are
great at writing code but they often explain things in a very technical way.
This course is designed from a visual thinker's perspective. My goal is to
thinker's perspective. My goal is to explain Comfy UI logically and visually without needing any coding knowledge.
Even if comfy UI looks confusing right now, that is completely normal. We will
start from the absolute basics and build up step by step. But before we talk about Comfy UI itself, we first need to understand what AI image generation
actually is. Today, AI is not just one
actually is. Today, AI is not just one thing. There are many different AI
thing. There are many different AI models that can run locally on your own computer, such as stable diffusion, Flux, Quen, and many others. It is also
important to understand that Comfy UI is not limited to image generation. Comfy
UI is a general interface for running many different types of AI models locally. While it is most popular for
locally. While it is most popular for image generation, it can also be used for audio, music, video, animation, 3D, and more. As long as a model can be
and more. As long as a model can be connected through nodes, Comfy UI can be used as the interface to control it.
These models by themselves are like an engine. They are very powerful, but you
engine. They are very powerful, but you cannot really use them directly. To work
with them, we need an interface. An
interface is what allows us to send prompts, images, and settings to the model and then receive results back.
There are many free interfaces that let us interact with these models. Some
popular ones are Forge UI, Swarm UI, Invoke, Focus, and of course, Comfy UI.
They often use similar models but they work in very different ways. In this
course, we are going to focus on Comfy UI. Comfy UI is different because it is
UI. Comfy UI is different because it is node-based. Instead of hiding everything
node-based. Instead of hiding everything behind buttons and menus, it shows you exactly what is happening step by step.
You can see how prompts, models, samplers, and images are connected together like building a system. Think
of it like this. The AI model is the brain. The interface is how you talk to
brain. The interface is how you talk to that brain. Comfy UI is like building
that brain. Comfy UI is like building your own control panel exactly the way you want. Do not worry if this still
you want. Do not worry if this still feels complex. Understanding comes from
feels complex. Understanding comes from seeing things connect, not from memorizing nodes. In this course, I will
memorizing nodes. In this course, I will explain what each node does, why it exists, and how everything connects together. Before we install anything, we
together. Before we install anything, we need to talk about how you will actually run Comfy UI. There is more than one way to use Comfy UI and the right choice depends on your system and your
expectations. Let's go to the official
expectations. Let's go to the official Comfy UI website to see the available options. The official website is
options. The official website is Comfy.org.
Comfy.org.
If we go to the products section, you can see that there are two main options, Comfy UI Cloud and Local Comfy UI. Comfy
UI Cloud runs online on their servers and it is a paid service. This option
can be useful if your computer is too old or not powerful enough to run AI models locally. Local Comfy UI is free
models locally. Local Comfy UI is free and runs directly on your own computer, assuming you have a reasonably capable system. This is the option we will focus
system. This is the option we will focus on in this course. So, let's click on local comfy UI. Here you can see three main installation options. Download for
Windows, for Mac OS, and install from GitHub. All of these options install
GitHub. All of these options install Comfy UI, but there are important differences between them. In this
course, I will focus on Windows operating system using the portable version of Comfy UI. All the workflows, tools, and installers I show are tested
on Windows using an Nvidia graphics card. On AMD graphics cards and on Mac
card. On AMD graphics cards and on Mac OS, performance is usually slower, and some features or custom nodes may not work exactly the same way. So, if you
are using Windows with an Nvidia card, it will be much easier to follow this course step by step as I show it.
Because there are many different AI models, hardware requirements can vary a lot. Some models are small and can run
lot. Some models are small and can run on a graphics card with 6 to 8 GB of VRAM. Other models are much larger and
VRAM. Other models are much larger and may require more than 24 GB of VRAM. For
this first episode, I tested the workflows on two different systems. One system uses an RTX 2060 with 6 GB of
VRAM and 64 GB of system RAM. The second
system uses an RTX 4090 with 24 GB of VRAM and 128 GB of system RAM. For the
workflows in this episode, a graphics card with 6 to 8 GB of VRAM should be enough to follow along. In later
episode, we will explore newer and larger models that may require more powerful hardware. Now, let's talk about
powerful hardware. Now, let's talk about which version of Comfy UI you should install. As I mentioned earlier, I am
install. As I mentioned earlier, I am using the portable version of Comfy UI.
If we click on install from GitHub, we are taken to the official Comfy UI GitHub page. Here you can find detailed
GitHub page. Here you can find detailed installation instructions, but they require more manual steps and setup.
Over the past year, I have been using a portable version of Comfy UI that includes additional tools to make the installation process much easier. This
installer installs the original Comfy UI, but it also adds helpful tools so you can get up and running much faster.
This installer is called Comfy UI Easy Install. You can find it on this GitHub
Install. You can find it on this GitHub page. You can find the creator on our
page. You can find the creator on our Discord community under username IVO.
Thank Ivo for this installer. This
entire course is built around this version of Comfy UI. You can still use Comfy UI Desktop or Comfy UI Cloud, but some things may look different or behave differently compared to what you see in
this course. If you want the exact same
this course. If you want the exact same setup that I use and the easiest way to follow along, I recommend using the same version. Let me show you how to install
version. Let me show you how to install Comfy UI. So, we are on the easy install
Comfy UI. So, we are on the easy install GitHub page. This is the complete link.
GitHub page. This is the complete link.
If we scroll down, you can read more about this installer. Even if you might not understand what each of these things means yet, it will make sense later as you learn more about it. I will talk
later about the Pixaroma Discord server where you can get more help and answers to your questions. So, this installer will install Git, which is a tool that
tracks changes to files in Comfy UI. It
helps developers safely update the main app and custom nodes, fix bugs without breaking everything, and lets you update or roll back to an earlier working version if a new update causes problems.
Then it will install the Comfy UI portable version. A portable version
portable version. A portable version means the program is fully self-contained in one folder, does not need a normal system install, and can be run, moved, backed up, or deleted
without affecting the rest of your computer. in Comfy UI portable. This
computer. in Comfy UI portable. This
means Python, libraries, models, and settings all live inside the Comfy UI folder. So, you can copy it to another
folder. So, you can copy it to another driver or PC, update it safely, and avoid breaking your system Python or Windows setup. Python embedded means
Windows setup. Python embedded means Comfy UI comes with its own built-in copy of Python already included inside the Comfy UI folder instead of using the
Python installed on your system. Then it
will install all the nodes that are useful and that I tested over the last year. It might not make sense for you
year. It might not make sense for you yet if you are a beginner, but do not worry. Take it as general knowledge for
worry. Take it as general knowledge for now and it will make sense later. Then
it will add an add-ons folder with more advanced stuff we can use later to speed up our generation, plus some extra tools that can be useful and then more technical stuff explained for each one.
But all you need to know for now is where to download it from and how to run the installer. It is important not to
the installer. It is important not to run it as administrator. That means you just doubleclick to run it, not right-click and run as administrator, but you will see that in a minute. Also,
avoid system folders and make sure your NVIDIA drivers are up to date since some things work only with more recent NVIDIA drivers. Okay, let us go back to where
drivers. Okay, let us go back to where it says Windows installation and let us download the latest release from here.
Then depending on how your browser is configured, it will either download it to the downloads folder or ask you where to download it and you can decide where to put it. As you will see over time, it
needs a lot of space if you download big models. So I suggest downloading and
models. So I suggest downloading and installing your Comfy UI on a hard disk that has a lot of free space and preferably on a solid state drive
because it will load the models faster.
I will go to my D drive and I will create a new folder called Comfy UI, but this does not really matter. The name
can be anything easy to remember.
Sometimes I put Comfy UI followed by the month so I know when I downloaded and installed it. So I will save this zip
installed it. So I will save this zip archive in that Comfy UI folder.
Now let us go to the place where we saved the file. Since this is a zip archive, we need to unzip it. You
rightclick on it and depending on what you prefer, you can use the Windows integrated option and select extract all. I like to delete the folder name at
all. I like to delete the folder name at the end so I do not end up with a folder inside another folder. When I click extract, it will extract these two files. Let me delete it really quick and
files. Let me delete it really quick and show you. If you have WinRAR like me,
show you. If you have WinRAR like me, you just choose extract here and it does the same thing. Once we extracted everything, you can delete the easy install zip file. Now we are left with
two files. A BAT file that is the
two files. A BAT file that is the installer and a zip file that contains extra resources that it will use. When
you run it, you might get a security warning. That usually happens with BAT
warning. That usually happens with BAT or executable files because they install files on your system. This one is safe.
I installed and tested it and I personally know EVO, the creator of this installer. You can rightclick and scan
installer. You can rightclick and scan it with your antivirus and you will see it is clean. So let us double click on the BAT file then press run and it will
start installing. If you already have
start installing. If you already have git installed it will update it. If not
you will get a window like this and you have to press yes to continue the installation. After that it will
installation. After that it will continue the installation of Comfy UI and everything it needs to run. You can
take a break for 3 to 5 minutes depending on your internet speed and your computer. So how do you know when
your computer. So how do you know when it is ready? You will see a message that says installation complete along with the time it took. On my PC, it took 247
seconds. After that, you can press any
seconds. After that, you can press any key to exit. So, let us recap really quick. From the GitHub page, you
quick. From the GitHub page, you download the zip archive of the easy installer. You create a folder named
installer. You create a folder named Comfy UI and place the downloaded zip archive in that folder. You extract the contents of the archive. You run the
Comfy UI easy install BAT file and if it asks to run GitHub you press yes. In a
few minutes the installation is complete and you get this screen. Now after the installation we can see that inside our Comfy UI folder a new folder appeared
called Comfy UI easy install. This is
portable which means you can copy this entire folder and move it to a different drive or folder and it will still run Comfy UI. Basically, after you install
Comfy UI. Basically, after you install any Comfy UI portable version, you should end up with a similar folder structure to this. Since Comfy UI is based on the Python programming
language, you will see many Python files and BAT files inside these folders, which are used to run those Python files. The easy installer will also
files. The easy installer will also create some shortcuts on your desktop.
If you use other versions of Comfy UI, they might not do this, and you would need to create the shortcuts manually.
That is one of the reasons I prefer the easy installer. It makes everything
easy installer. It makes everything easier for us. Basically, we just extracted an archive and ran a BAT file and we now have Comfy UI. If we right
click on this shortcut and go to properties, we can see the target of the shortcut. If we open the file location,
shortcut. If we open the file location, you can see that it is connected to this BAT file that starts Comfy UI. In other
versions of Comfy UI, the name might be different like run NVIDIA GPU or something similar. Are you ready for
something similar. Are you ready for your first Comfy UI launch? To start
Comfy UI, you either use this BAT file called Start Comy UI or from the desktop you use this shortcut called Comfy UI easy installer. E and Z stand for easy
easy installer. E and Z stand for easy and I stands for installer. So double
click on it and when it starts it looks like this. The first time it will be a
like this. The first time it will be a little slower, but after that it will start much faster. If you are curious by nature, you can find all kinds of
information about your Comfy UI and your system when it runs. For example, you can see what operating system I am using, what Python version is running, and where that Python is located, what
the path to the Comfy UI folder is, where the user directory is, how much VRAM you have, how much system RAM you have, and the PyTorch and CUDA versions
that are running. When it starts, this command window will be minimized to your taskbar, and Comfy UI will open in your default browser. The first time it
default browser. The first time it opens, it will show you some templates made by Comfy UI that you can load. If
you have run a workflow before, it will open that workflow by default. So the
workflow you see open is the last one you used. A workflow is a set of
you used. A workflow is a set of connected nodes that tells Comfy UI what to do step by step. Let us close this for now. You can open these templates or
for now. You can open these templates or workflows from here later. Comfy UI is made of a few main areas. You do not need to memorize them. I am naming them
so we can talk about the same things later. If we go to the top, you can see
later. If we go to the top, you can see it says unsaved workflow. Basically, it
is like a document that is empty at the moment since we did not add any nodes yet. You can have multiple documents
yet. You can have multiple documents open similar to what we have in Photoshop and other programs. We can click on this plus icon to create a new blank workflow. All these tabs on top
blank workflow. All these tabs on top are open workflows and we can close, save and edit each one. Now this grid like empty space is called the canvas.
Instead of drawing on it, we will arrange blocks or nodes like using Lego pieces and connect things together to create a working workflow. You can use
your mouse wheel to zoom in and out on the canvas. Then we have this top bar.
the canvas. Then we have this top bar.
Depending on what extensions you have installed, it might look different and have more options. Things like the manager or the run button, which lets us run workflows, are usually here. On the
bottom right, we have view controls. For
example, we have a select tool that lets us select nodes and a hand tool that lets us navigate the canvas. You can fit a workflow in view, but right now the canvas is empty. We also have different
zoom controls that you can use if you do not want to use the mouse wheel or if you do not have one. For me, the mouse wheel is the fastest and the one I prefer. Then we have the show mini map
prefer. Then we have the show mini map option. This shows a small map that we
option. This shows a small map that we can use to navigate when we have very large workflows. There is also hide
large workflows. There is also hide links, but since we do not have any nodes or links yet, we will see that later. An important one is the main menu
later. An important one is the main menu which you open by clicking on the letter C, the Comfy UI logo. We also have more options on the left sidebar for nodes,
models, and workflows, which we will explore soon. Back to the main menu. If
explore soon. Back to the main menu. If
we click on the C, we get this menu. New
creates a new workflow, but it is faster to use the plus sign from the top bar.
For file, it allows us to open, save, and export the workflows we create. For
edit, you can undo actions like moving nodes or changing something in the workflow, clear the workflow and unload models. For view, we can enable and
models. For view, we can enable and disable different panels. And we also have zoom in and zoom out controls. Just
like in Photoshop, we can do the same things in multiple ways. It is the same with comfy UI. For theme, you can change how it looks, but at the beginning, I
suggest leaving it on default so it is easier to follow tutorials. Nodes 2 is in beta at the moment of this recording and still has some bugs, so I suggest leaving it off until it is more stable.
You can browse templates and open settings, which we can explore later.
For now, the default settings work fine.
Templates and settings can also be accessed from here. So again, there are multiple ways to access the same things.
In some newer versions, some people might use a newer manager and it might appear somewhere else instead of here.
For now, I am using the old manager which appears here. Under help, you can also find help options, but you will see later in this video how to ask questions
and get help. We also have a console sometimes called the bottom panel where you can see exactly what has happened since we opened comfy UI. If we look at the taskbar and open the command window
from there, it shows the same information. One view is at the bottom
information. One view is at the bottom and the other is in the taskbar. To
close Comfy UI, I recommend opening the command window from the taskbar and closing it. You will then see a
closing it. You will then see a reconnecting message in the browser because it cannot find Comfy UI running anymore. After that, you can close the
anymore. After that, you can close the browser window. You can also close the
browser window. You can also close the browser window first and then close the command window. It is time to test our
command window. It is time to test our first ready-made workflow. Later, I will explain in detail what nodes are and what they do. So let us start Comfy UI,
wait for it to finish loading and get the interface. To open a workflow, you
the interface. To open a workflow, you have different options. You can drag a workflow directly onto the canvas, or you can go to the menu, then file, and
choose open. All workflows for Comfy UI
choose open. All workflows for Comfy UI have the extension.json.
JSON means JavaScript object notation.
It is a simple text format used to store and share data in a way that both humans and computers can easily read. In Comfy
UI, JSON is important because workflows are saved asJSON files. These files
store all your nodes, connections, settings, and prompts so you can reload, share, or edit a workflow later. I will
include these workflows for free on Discord for those who use a different Comfy UI version. For example, I can open this first workflow and you can see that it opens with all the nodes and
links ready to be used. You can use your mouse wheel to zoom in and out to see the entire workflow. You can click outside the nodes somewhere on the
canvas and drag to move around. You can
also use this hand tool, which I personally never use. With the hand tool, you can pan around the canvas.
With the normal mouse cursor, you can select nodes and move them around. We
will talk more about that later. Now we
have the workflow open in this tab and you can see its name at the top. With
the X button, you can close the workflow and go back to a new empty one. If we go to the sidebar and click on workflows, I can open this folder called getting started which I prepared for you for
this episode. Only the easy installer
this episode. Only the easy installer comes with these workflows. So, if you are using a different version, you can get the workflows from Discord. You can
also make the sidebar wider if you want to see the full text. I added a few workflows here to test in this episode.
This one is just a help file with notes and useful information that we will use later in the video. Let us close it and open the one with number one in front
called Juggernaut Reborn. If I click on workflows again, the sidebar collapses.
Now let us move around using the mouse.
Left click and drag to see it better.
Each of these blocks is called a node.
All nodes are connected to each other using links. Those small cables that go
using links. Those small cables that go from one node to another. Usually a
workflow is built from left to right.
When you run a workflow, it processes from left to right. If something does not work or the workflow is broken, Comfy UI tells you where the problem is.
Think of it like a car dashboard. If a
door is open or a light bulb is not working, you get a warning icon. It is
the same here. Errors look like this. It
says prompt execution failed and it also tells you something like value not in the list. These are some of the simplest
the list. These are some of the simplest errors to fix. It is like the car telling you a light bulb is missing. In
Comfy UI, it means a specific value, object, or file could not be found. In
our case, it could not find the checkpoint name, which is the model name, the brain as we called it. Comfy
UI workflows include all the nodes and settings, basically the interface, but they do not include the models themselves. Those brain or engine files
themselves. Those brain or engine files are not included. Since workflows are just JSON text files, they cannot include images or large files like
models. In this node called load
models. In this node called load checkpoint, the checkpoint is just a model file. The brain we talked about.
model file. The brain we talked about.
Even if I click here, I cannot select anything because it is not in the list.
That means the model is not downloaded yet or it is downloaded but placed in the wrong folder. Since I did not download any models yet, it is clear I
do not have it. That is why when I share a workflow with you, I include a note that tells you exactly what you need to download for the workflow to work. Not
everyone on the internet does this, but most good workflow creators do. The way
I organize it is like this. I tell you where the model needs to be downloaded and which node loads it. It says load checkpoint, which is the node name. Then
it tells you the model name you need to download. There is a button that says
download. There is a button that says here and then it tells you exactly which folder to place it in and which folder to create if it does not exist. That is
enough theory. Let us download the model. You already saw where it needs to
model. You already saw where it needs to be placed, but how do you find that folder? You need to find your Comfy UI
folder? You need to find your Comfy UI folder. This depends on where you
folder. This depends on where you installed it, on which drive, and in which folder. You navigate until you
which folder. You navigate until you find the Comfy UI folder. In our case, it is inside the Comfy UI easy install folder. If we go inside, we see many
folder. If we go inside, we see many folders that Comfy UI needs to run. We
have an output folder where generated images are saved. We have an input folder where input images are stored. We
also have a models folder where all downloaded models go. Inside the models folder, you can see many subfolders for different types of models. Over time,
you will learn what each one is for.
That is why I included the note so you know exactly where to put the model without guessing. For this workflow, the
without guessing. For this workflow, the model goes into the checkpoints folder.
We could just save it directly there and it would work. But from my previous tutorials, I learned that over time, you will download many models and it becomes hard to keep track of them. That is why
I like to organize models in subfolders.
In this case, I know this model is based on stable diffusion 1.5. So, I will create an SD15 folder and place the model inside it. Now, we wait for the
model to download. Some models are a few gigabytes in size. After that, we go back to Comfy UI. You can see I placed the model exactly where the instructions
said, so Comfy UI can recognize it. If
Comfy UI was closed, reopening it would automatically detect the model. But
since Comfy UI is already open, it will not see the new model yet. We need to refresh it. To do that, press the R key.
refresh it. To do that, press the R key.
You will see that the node definitions update. Now, when I click here, I can
update. Now, when I click here, I can see the model name and select it. Right
now, there is only one model, but later you will have a drop-own list with many options. Now, the model is selected and
options. Now, the model is selected and it is time to run the workflow again. By
the way, you can move the run button anywhere you want on the canvas using the small dots on its side. If you
prefer it docked, you can dock it back to the top bar. Let us run it again and see if it works. Everything turns green.
Each node runs from left to right and no red nodes appear. That means the workflow ran successfully and we generated our first image. The model we use in this chapter is quite old and
small. Later we will use smarter and
small. Later we will use smarter and more advanced models. For practice, this one is good enough because it is fast and can run on smaller computers that do
not have a lot of VRAM. Each time I press run, I get a new image because we have a random seed here. Do not worry about this yet. I will explain it later.
So now we can generate an image with comfy UI and all of this comes from a simple text called a prompt. Basically
we used a few nodes with specific settings and a model trained for this type of image generation. We can change the prompt for example photo of a cat
closeup. Now when I run the workflow I
closeup. Now when I run the workflow I should get a cat. The more VRAM you have the faster it will generate. We can see the generated images here, but they are
also saved locally. If we look at the output folder, we have a shortcut to it on the desktop. Inside that folder, we can see all the images we generated so
far. Let us go back to Comfy UI and
far. Let us go back to Comfy UI and close this workflow. I do not want to save it because I liked the prompts and settings it had before. So, I choose no.
Now, we are left with an empty workflow.
or you can click on the plus sign to create a new blank workflow. Before we
move to the next chapter, I recommend taking a short break. Research shows
that short pauses help your brain process and retain new information. Grab
a coffee, get some water, or take a quick bathroom break, then come back and continue. This chapter is about
continue. This chapter is about understanding the building blocks of Comfy UI and how they connect to form a workflow. We are in Comfy UI and we have
workflow. We are in Comfy UI and we have this blank canvas and workflow. To add a node, you doubleclick on the canvas and it will open a search box that lets you
search for a node. For example, if I type the word load, it shows me load image, load checkpoint and all kinds of nodes that let us load something. If we
click on the load image node, it will be added to the workflow. The position
where it is added depends on where you doubleclick on the canvas. You can also move it after. You just leftclick on a node, hold the left mouse button, drag it to where you want it, and then
release the button. To deselect a node, you just click anywhere on the empty canvas. For me, that is the fastest way
canvas. For me, that is the fastest way to add a node. But there are other methods. For example, I can write click
methods. For example, I can write click on the canvas in an empty area and get this menu. From here, I can go to add
this menu. From here, I can go to add node. Then I see different categories.
node. Then I see different categories.
If I click on the image category, I can find the load image node. It is right here. And if I click on it, it gets
here. And if I click on it, it gets added to the canvas. After that, you can move it and arrange it wherever you want. Another method is to use the node
want. Another method is to use the node library in the left sidebar. Here we
have all these categories. If I click on the image category, I can see the load image node. This is a good option if you
image node. This is a good option if you do not know exactly which node you are looking for. You can also search for a
looking for. You can also search for a node here to filter the list, then add the node or drag it onto the canvas. Out
of all these methods, my favorite is still the double click on the canvas.
Once a node is selected, you also have the option to delete it using this icon.
You can also use the delete key or the backspace key to delete a node after you select it. The load image node is how we
select it. The load image node is how we bring an existing image into Comfy UI so other nodes can work with it. Each node
has a title at the top that tells you what it does. Below that, it has controls, inputs, and outputs that connect it to the rest of the workflow.
Let us double click on the canvas again and add another node. This time, I will search for crop. And we get this node called image crop. You can probably guess what it does. It crops the image
that we loaded. You can change the image using this button and upload any image you want. If something goes into a node,
you want. If something goes into a node, it is called an input. If something
comes out of a node, it is called an output. The load image node has two
output. The load image node has two outputs but no inputs because the image comes directly from your computer, not from another node. The image crop node
has one image input and one image output. It receives an image, modifies
output. It receives an image, modifies it, and then sends out a new image. If
we leftclick on one of the outputs from the load image node, we can drag a connection or a cable to the next node and connect it. Because the output and the input have the same color and the
same name, it is easy to see that they belong together. In most cases,
belong together. In most cases, connections work between the same colors and different colors usually do not connect. There are a few special cases,
connect. There are a few special cases, but we will talk about those later. Now,
if I try to connect the green output, it does not connect. That is because the green output is a mask and the input on this node expects an image which is blue. In the beginning, this color
blue. In the beginning, this color system helps you quickly understand which nodes can be connected. If two
nodes cannot connect, it usually means they are not meant to be connected.
Sometimes you will also find nodes that act like adapters or converters. These
nodes take one type of output and convert it into a different type so it can be used by another node. Now
basically we have a workflow but is the workflow complete? How can we test it?
workflow complete? How can we test it?
It is simple. We run it and see what message we get. In this case it says the prompt has no output. Even if you do not understand exactly what that means yet
try to figure it out from the words. We
do not have an output node. So let us close this message. If we look at the workflow, the image is loaded from the computer. Then it goes into the image
computer. Then it goes into the image crop node which crops the image. But
after that nothing happens. There is no output. Think of this like editing a
output. Think of this like editing a photo in Photoshop. You load an image, crop it, but if you never save or export it, the work exists, but you do not get
a file. In Comfy UI, the save image node
a file. In Comfy UI, the save image node is the export step. So let us doubleclick on the canvas again and search for save. We have this save image
node. We can see that the image output
node. We can see that the image output color matches. So we can connect it.
color matches. So we can connect it.
Even if the label says image or images, it still works. Now let us make the connection. If we run the workflow, it
connection. If we run the workflow, it will process from left to right. The
image is loaded, cropped, and then saved in the output folder. We can also see it directly inside the save image node. All
nodes can be resized using the corners.
You will see small arrow indicators on the corners. You can click and drag to
the corners. You can click and drag to resize a node. In this case, I want to see the image preview bigger, so I resize the node. To remove a connection,
you can leftclick on the output dot, drag the cable out onto the canvas, and it will disconnect. You can also leftclick on the small dot in the middle
of the connection and choose delete. You
also have the option to add a reroute. A
reroute is like an extension cable or a cable organizer. It does not change the
cable organizer. It does not change the data at all. It only helps you route connections more cleanly and keep your workflow readable. From that reroute
workflow readable. From that reroute node, you can add another link to another node if you want. You can also have multiple reroutes on the same link, so you can arrange nodes, links, and
reroutes in a way that looks visually clean or helps you see faster which node connects to which node. This is very helpful when you have a lot of nodes in a workflow. To remove a reroute, you
a workflow. To remove a reroute, you just select it and press the delete key.
Let me arrange them and remove all links so we can see it better. So in Comfy UI, we have different types of nodes. First,
we have nodes that only have outputs.
These nodes usually load something from outside Comfy UI like a file or some text. In this case, the load image node
text. In this case, the load image node loads an image from your computer. So it
does not need any inputs, only outputs.
Then we have nodes that have both inputs and outputs. These nodes are usually
and outputs. These nodes are usually placed in the middle of a workflow. They
receive something from one node, process it, and then pass the result to the next node. Finally, we have nodes that
node. Finally, we have nodes that usually sit at the end of a workflow.
These nodes only have inputs, and their job is to show or save the result, for example, by previewing an image or saving it to disk. There are also nodes
that do not have any inputs or outputs at all. For example, if I search for a
at all. For example, if I search for a note node and add it to the canvas, you can see that it is onlyformational.
These nodes are used to write notes and make workflows easier to understand and remember. They do not affect the
remember. They do not affect the workflow at all and are just for organization and clarity. On the top left side of a node, next to the title, you have a small gray dot. If you click
it, the node collapses, similar to minimizing it. I often do this for nodes
minimizing it. I often do this for nodes where I already know the settings and do not need to change them. Collapsing
nodes helps save space and makes the workflow easier to read. If you right click on a node, you get a menu called the node context menu. This menu shows
options related to that specific node.
Each node has a slightly different menu depending on what that node does. In
this case, we have options like opening, saving, and copying the image, different properties, resize, and colors. We can
also collapse the node from here instead of using the gray dot. And there are many more options. Try a few of them. If
you do not like what you did, you can undo it with controll + z. You can also change the title of a node. If you
double click on the node title, you can rename it to anything you want. This
does not change how the node works at all. It is only for your own
all. It is only for your own organization and to make the workflow easier to understand. You can also rightclick on a node and choose title to rename it. This is another way to change
rename it. This is another way to change the node name. We already know that we can move a node around once it is selected. But when a node is selected,
selected. But when a node is selected, you will also see a small floating bar at the top. From here, you can delete the node using this icon. You can also
click on this dot to change the node color. This lets you choose from
color. This lets you choose from different colors, which is useful for organizing your workflow or grouping nodes by function. Changing the color does not affect how the node works. By
default, there is no color, the gray one. This small eye icon is the node
one. This small eye icon is the node info. If you click it, a properties
info. If you click it, a properties panel opens on the right with more information about the node. Here you can see what the node is supposed to do and what the values mean. From this icon,
you can also close the properties panel.
All nodes, especially the default comfy UI nodes, should have some kind of info unless it is a custom node and the creator did not add any documentation.
You can drag this side panel and resize it the way you like. Here you can see all the information about the image crop node and what each setting does. Let us
close it for now. If you hover over an icon, you can see more information about what it does. For example, because this node works with images, it lets you open
it in the mask editor. If we click on it, you can see that it opens in the mask editor. This will be useful later
mask editor. This will be useful later when we do inpainting and image editing, but that is for another episode. For
now, it is enough to know that it is here and you will learn more about it over time. These numbers are just
over time. These numbers are just settings that you can change in Comfy UI. They are called parameters.
UI. They are called parameters.
Parameters control how a node behaves and how it processes its input. Let me
reconnect the nodes. So, we have a working workflow again. Now, if I run it, you can see what these parameters actually do. We are cropping a 512x 512
actually do. We are cropping a 512x 512 pixel area from the original image starting from the X and Y coordinates set to zero. That means the crop starts
from the top left corner of the image.
So basically we are taking this small corner from the original image. The
original image was 1,024x 14 pixels and now the result is 512x 512 pixels. Even if it looks bigger here in
pixels. Even if it looks bigger here in the node preview, it is not actually larger. That is just the preview size.
larger. That is just the preview size.
The real image resolution is smaller. So
let us change some parameters or settings or however it is easier for you to remember them. Values is also fine.
And run it again. Now we have a different crop. Let me speed up the
different crop. Let me speed up the video while I try different values so you can see how the result changes.
Let us remove this middle node, the image crop node. Once it is removed, something interesting happens. Comfy UI
tries to keep the workflow connected and automatically reconnects the nodes directly. That happens because the
directly. That happens because the output of the first node and the input of the last node use the same type. If
the first and last nodes had different input and output types, the connection would disappear when the middle node is removed. Now let us doubleclick on the
removed. Now let us doubleclick on the canvas. You should remember by now that
canvas. You should remember by now that every time you see this search bar, it means I doubleclicked on the canvas. Let
us search for invert and select invert image. This node has an image input and
image. This node has an image input and an image output. But it does not have any parameters. That is because this
any parameters. That is because this node is designed to do one specific thing. Invert the image. Even without
thing. Invert the image. Even without
parameters, the node still performs a function. Let us connect this node into
function. Let us connect this node into the workflow. Watch what happens when I
the workflow. Watch what happens when I connect it to the input. You can see that the previous connection is removed automatically. That is because an input
automatically. That is because an input can only have one connection at a time.
An output on the other hand can connect to multiple nodes. You can think of it like electricity. An output is like a
like electricity. An output is like a power strip. It can send power to many
power strip. It can send power to many devices. An input is like a wall socket.
devices. An input is like a wall socket.
It can only accept one plug at a time.
If we run the workflow now, we can see that the result is an inverted image. So
until now with these small workflows, we did not use any AI. We only used simple nodes like simple code to modify images.
We will see more in the next chapter when we build a bigger workflow that uses stable diffusion to generate an image from text. But these small steps help you understand how things work. At
least I hope they do. You can always ask any questions you have on Discord as we will have a special section for this episode on the Discord forum. So we
learned that save image is usually the last node in a workflow since it does not have any outputs. And because the output is an image that goes to disk, not to another node. But that does not
mean we cannot continue the workflow. It
only means we cannot continue from that node. We can still continue from the
node. We can still continue from the previous node which has the same image just not saved to disk yet. Let us clone this node and use it again. One simple
way is to press the alt key and drag a copy of this node where you want it. Let
us delete it and try again. Now we will use controll + c to copy. And when I use controll +v it will paste that node where the mouse cursor is. Let us delete
it again. And now let me show you
it again. And now let me show you another shortcut. Press the control key
another shortcut. Press the control key and make a marquee selection over the nodes you want to select. Now we
selected two nodes. With Ctrl + C, we copy all selected nodes. And with Ctrl +V, we paste them. If you click and drag from any of the selected nodes, you can
move them together. If you press delete while both are selected, it will remove both. If we use controll + shift +v, it
both. If we use controll + shift +v, it will paste the nodes together with the links they had in the workflow. Now we
have this extra link here. So basically
from one image we got two invert image nodes and both do the same thing invert the image. If the invert image node had
the image. If the invert image node had more parameters we could change the settings in one and get different results. Let us delete those again.
results. Let us delete those again.
Practice this a few times. Press control
and select the nodes. Presstrl + c to copy and ctrl + v to paste. Move them
into position. Now look at what I am doing. I am continuing the workflow from
doing. I am continuing the workflow from the last invert image node and then I save the result. A workflow can have many branches like a tree. The root
starts with the image. Then it inverts the image and from there on another branch it inverts it again. Can you
guess what happens when I press run? The
image is inverted again and looks like the normal image. The original image is inverted. Then that inverted image is
inverted. Then that inverted image is inverted again and we get the original result. We can continue the workflow
result. We can continue the workflow even more. From the image that was
even more. From the image that was inverted twice, we connect it to an image crop node. Now, instead of double clicking and searching for a node, we can drag a connection and release it.
When we do that, a context menu appears with suggested nodes. From here, I can easily pick the save image node and it is added already connected. Let us
delete it and try again. Drag the
connection and release it. Then select
search. When I select the save image node, it is added already connected. Let
us place the nodes properly and run the workflow. You can see how many
workflow. You can see how many operations are now in this workflow.
With a single image, we can invert it, invert it again, crop it, and save it.
This is similar to a small program or an action in Photoshop, but with more control and much more flexibility. There
are nodes for images, audio, 3D, and many other things. This is where you start to see the power of comfy UI. Now
let us select everything. Hold controll
and drag to select all nodes. You can
press delete to remove everything or let us cancel that and do it another way. Go
to the menu then edit and choose clear workflow. It will ask if you want to
workflow. It will ask if you want to clear it. Click okay. And now we have a
clear it. Click okay. And now we have a blank workflow again. Do you like math?
I know you do not like it, but I just want to show something quick to see the different things it can do and help you understand Comfy UI better. Double click
on the canvas and search for math. You
will see a few math nodes. If we look on the right, you can see different names like Comfy UI core, KJ nodes, easy use.
These names are the names of custom nodes or extensions. By default, Comfy UI comes only with the nodes you see under Comfy UI core. With the easy
installer, you also get a few extra custom nodes already installed. We will
talk more about custom nodes later when we get to the manager. If you use the easy installer like I showed at the beginning of this video and install the same version, you should have the same
nodes. So again, Comfy UI core nodes are
nodes. So again, Comfy UI core nodes are made by the Comfy UI team. Let us get back to the math nodes. We will start with something simple called math int.
int comes from the word integer which means whole numbers. This node works only with whole numbers like 1 2 10 and so on, not decimals. All custom nodes
have an extra label on the top right that shows which custom node pack they belong to. This makes them easy to spot
belong to. This makes them easy to spot compared to built-in nodes. These math
nodes are used for simple calculations similar to a calculator. I personally do not use math nodes very often, but they can be very useful for automation. For
example, you might load an image, read its width or height, and then use math nodes to calculate a new value based on that size. This allows you to
that size. This allows you to automatically adjust things like resolution without manually changing numbers every time. In this case, we have letter A, letter B, and an
operation. The default operation is add.
operation. The default operation is add.
Let us set A to five. and B to three.
For the operation, we will leave it on add for now. Let us add another node that I use often called preview as text.
You can see it comes with comfy UI. This
is one of those special nodes I mentioned earlier that can be connected to almost anything. Even if other nodes cannot connect directly, this node will convert the value to text and display
it. If I run the workflow, you can see
it. If I run the workflow, you can see the result. Even though they look the
the result. Even though they look the same, one is actually a number and the other is a text display of that number.
This makes more sense if you have coding experience, but we will not get into technical details here. What is
important to remember is that we can use this node to see a result as text. It
also has options for how the preview is displayed. Let us move this node down
displayed. Let us move this node down and make a copy of the math in node.
Now remember, we have a result in this node, but it is not visible unless we use a node to preview or save it. Here
is something interesting. We do not see any inputs on this node, but when we drag a link to it, two input dots appear. This means we can actually
appear. This means we can actually connect values directly to these fields.
You will see this behavior with many nodes that have number fields or text fields. We can copy a value from an
fields. We can copy a value from an output and feed it into these fields to use it in the workflow. Notice how the field for letter A is grayed out. That
means it is no longer using a manual value. Instead, it is taking the value
value. Instead, it is taking the value from the previous node which is 8. Now
let us change the operation to multiply.
We now have 8 multiplied by 3. Let us
add another preview as text node to see the result. When we run it, the result
the result. When we run it, the result is 24. As expected, let us remove that
is 24. As expected, let us remove that preview node and arrange the layout.
This small workflow does something simple. It adds two numbers and then the
simple. It adds two numbers and then the result is multiplied by three. That
three could also come from another node and so on until you build more complex workflows. If I change the value to four
workflows. If I change the value to four and run it again, we get the correct result for that formula. I hope this was not too much math. Now let us select the middle node that does the multiplication
and right click on it. We have a function called bypass. When we enable bypass, the node is temporarily ignored as if it is not part of the workflow. By
the way, you can also access bypass quickly from this icon when the node is selected. Now, if I run the workflow,
selected. Now, if I run the workflow, you can see it ignored that node and only did the addition. If I enable it again and run the workflow, it takes that node into account again. You can
see that the node changes color and becomes purple and semi-transparent.
This visual change tells us that the node is deactivated. Bypassing a node is useful when you want to test a workflow without removing the node completely.
Hold the control key and select all three nodes. Now, if we rightclick on an
three nodes. Now, if we rightclick on an empty area of the canvas, we have the option to add a group. If we choose add group, it will create an empty group.
But since we already selected the nodes, it is better to choose add group for selected nodes. This creates a group
selected nodes. This creates a group that contains all those nodes. You can
think of a group like a folder that holds multiple nodes together. One very
important thing to remember is that if you want to move the group with all the nodes inside, you need to drag it using the group's top bar. If you select and move an individual node, it will move
outside of the group. Groups can also be resized. You can see a small triangle in
resized. You can see a small triangle in the corner that lets you change the size of the group. If you rightclick on a group, you also have the option to bypass all the nodes inside it. This is
very useful when you have multiple workflows on the same canvas. For
example, you can deactivate one workflow and enable another so only one workflow runs at a time. This becomes important as workflows get more complex and models
get larger because running multiple workflows at once can require more resources than your system can handle.
If you double click on the group title, you can change the group name. Enough
with math. Let us work a little with text as well. When we use AI, we give it prompts. And sometimes it helps to
prompts. And sometimes it helps to combine text from different sources to get a better prompt.
Now I am searching for concat. And you
can see that there are multiple nodes with similar names. That is because concatenate is a general concept and it exists for different data types. This
one here, concatenate, works with strings, which means text. It simply
takes multiple pieces of text and joins them together into a single string. That
is why I added this cat made from multiple pieces joined together to make it easier to remember. Even if you search for cat, you can easily find the
concatenate node. Let us add it to the
concatenate node. Let us add it to the canvas. For example, for string A, I add
canvas. For example, for string A, I add the word home and for string B, the word car. When I connect them, the output
car. When I connect them, the output becomes a single piece of text, a string. Let us drag a connection from
string. Let us drag a connection from that string and search for a node that can preview it. We can use the same preview as text node again. Now, because
the first workflow is bypassed, it will only run this workflow with concatenate.
You can see how it joined those two words, first home, then car. We can also use a delimiter. For example, I can add a space here and run it again. Now the
result has a space between the words or I can add a comma and a space. And now
the result looks like proper text with separation. Let us move this node down
separation. Let us move this node down and hold alt and drag a copy of the concatenate node. You can move nodes
concatenate node. You can move nodes around to make room so the connections are easier to see. Here we have these text fields. And like you saw with the
text fields. And like you saw with the math nodes, we can connect outputs directly into these fields if they are the same type. When I drag a link from the string output, you will see input
dots appear showing that I can connect there. I can connect to the first field,
there. I can connect to the first field, the second field or even the delimiter.
Let us connect it to the first field. So
now the home and car result becomes the first input. And for the second field,
first input. And for the second field, let us add the word flower. I will hold alt again and drag another duplicate since that is faster. Then I connect it.
Can you guess the result? We now get home, car, and flower. So basically this is how people create workflows. You
connect nodes like Lego pieces. Some
nodes can be connected together because they share the same input and output types and you get a result. Over time,
you can build more complex workflows that can save you a lot of time. Let's
add another node. Double click on the canvas and search for primitive. This
node is called primitive because it represents the most basic types of values. Things like numbers, text, and
values. Things like numbers, text, and true or false values are considered primitive values. The primitive node is
primitive values. The primitive node is used to manually create a value inside comfy UI instead of getting it from another node. You can use it to type in
another node. You can use it to type in a number, write some text, or define a simple value that can then be connected to other nodes. Think of it like writing a value by hand and injecting it into
the workflow. You can see here it says
the workflow. You can see here it says connect to the widget input. So we can drag a link from there. Now you can see we have a lot of inputs where we can connect this value. If we look at this
text, notice what happens when the connection is complete. It changes to the type of value that was connected, a string. Now we can manually insert any
string. Now we can manually insert any text value there. When we run the workflow, the result will include that value. This is useful because sometimes
value. This is useful because sometimes you want to use the same value in multiple nodes. Instead of typing it
multiple nodes. Instead of typing it manually each time, you add a primitive value once and connect it to all the inputs that need it. Let us right click
on this group. Usually nodes have a bypass option to disable or enable them.
But for groups, this option is called set group nodes to always. Now the nodes inside the group are enabled and we can run that workflow if we want. Let us add
another primitive node to see how it adapts. Last time when we connected a
adapts. Last time when we connected a primitive node to a text field, it automatically converted the value into a string because that input expected text.
Now if we connect a primitive node to a math int node, it adapts differently.
This time it is converted into an integer value. You can see that now we
integer value. You can see that now we can only enter numbers. It does not allow text because this node expects an integer. Right now the value is set to
integer. Right now the value is set to five. Let me resize the node so we can
five. Let me resize the node so we can see it better. You can clearly see that this one is an int and the other one is a string. If we change this number and
a string. If we change this number and run the workflow, Comfy UI will rerun all the workflows on the canvas using the new values. In this simple example,
it runs almost instantly. But later when we use larger models, you will see that some workflows can take minutes to generate instead of seconds. So if you look at all the nodes in these
workflows, you can clearly see that we use this easyuse custom node and all the rest do not have that label. That means
they come with comfy UI by default. Let
us go to the menu then edit and choose clear workflow. Now I want to do a quick
clear workflow. Now I want to do a quick recap just to make sure you assimilated some of the basics. We double click on the canvas to bring up this search option. So we can search for nodes. You
option. So we can search for nodes. You
type a word to search like load and then you can select a node. For example, the load image node. This node is used to load an image from your computer. If we
click choose file to load, we can navigate our computer and load an image.
By the way, I asked EVO to include some images for this first episode in the input folder. So you can have the same
input folder. So you can have the same images I am using. The path is Comfy UI easy install Comfy UI input. Let us say
I select this helmet but it can be any image. We can choose open or we can
image. We can choose open or we can double click on the image and it will open. Now that the image is loaded, let
open. Now that the image is loaded, let us add another node, the image crop node, and connect it from left to right from output to input. To remove a connection,
you just drag from the output and release it somewhere on the canvas. It
is like unplugging a cable and leaving it on the floor. Let us redo the connection. You can also click on the
connection. You can also click on the small dot in the middle of the connection and choose delete and it does the same thing as unplugging. Let us
connect it again. We can hold control and select multiple nodes by dragging with the left mouse button pressed. This
lets us select multiple nodes at once.
You can also hold control and click on the nodes one by one. If you select a node by mistake, just click on it again to deselect it. Once nodes are selected,
we can move them together. If you plan to move them a lot, you can also add them to a group. If you click on the canvas and drag, you are moving the canvas itself. This is useful when you
canvas itself. This is useful when you have long workflows and want to see different parts of the workflow. Let us
move these nodes to the left. Now,
double click on the canvas and add a node called preview. This time, it is not preview as text. We could add that too, but it would show numbers. Here we
add preview image. This node is similar to save image, but it does not save the image in the output folder. It is useful when you just want to preview parts of a workflow and do not need to save the
image. If you like the image, you can
image. If you like the image, you can still save it. You can rightclick on the image and choose save image. Then save
it anywhere you want on your hard disk.
Let us cancel that and remove the preview image node. This time let us add a save image node so we can save the result and then run the workflow again.
Now the result is saved. Let us go to the desktop. The easy installer comes
the desktop. The easy installer comes with a shortcut to the output folder. We
double click on it and now we are in the output folder. You can see the path at
output folder. You can see the path at the top. Here you can see all the images
the top. Here you can see all the images generated with Comfy UI. These images
are saved in this folder. You can delete them, move them to different folders, or organize them however you want. I
usually pick the images I need, move them into the project folder, and then delete the rest because over time you will end up with thousands of images.
Here we can see the helmet image we just generated, but not the previous one from the preview image node. If we go back one folder to the main Comfy UI folder,
we can see a temp folder. Comfy UI uses this folder to store preview images temporarily. The contents of this temp
temporarily. The contents of this temp folder are deleted when you start Comfy UI again. So, you can still recover
UI again. So, you can still recover preview images even if you did not save them right away, as long as you did not close Comfy UI yet. We can collapse a
node using the top left gray dot and click it again to expand it. Once a node is selected, it has multiple options at the top. One very useful option is the
the top. One very useful option is the info icon, which gives you more information about the node. You can
close the properties panel from here. We
can change the color of the node. And
this symbol here is for subgraphs.
Subgraphs are a bit more complex. So
maybe in a later chapter or another episode, we will talk more about them.
This arrow lets you bypass the node. And
if you click on these dots, it shows even more options for the node. For
example, you can change the shape, change the color, or pin the node so it is fixed and cannot be moved. If we move these nodes apart to see the links, we
can also hide the links from here. Be
careful with that because it can look like no nodes are connected. I never use this option because I like to see how nodes are connected. It helps me
understand the workflow better. If you
do not like how the links look, there are ways to change their shape. I like
the default look, but some people prefer other options. If we go to the bottom
other options. If we go to the bottom left or open the menu and go to settings, we can change this. Let us
click on settings. Here we have many settings we can change. Let us search for link since we want to change how links are displayed. You can see the current one is called spline under link
render mode. If we change it from spline
render mode. If we change it from spline to straight and close the settings, you can see the links are now straight, but they still adapt when you move the
nodes. Let us go to settings again and
nodes. Let us go to settings again and change it to linear. Now the links are always straight lines.
Let us change it back to the default which is spline. Now let us remove all the nodes.
I personally prefer the spline view because it reminds me of sci-fi scenes with lots of cables hanging around.
Let us double click on the canvas and add a load image node again. Now let us add another node and search for upscale image. We will select the node called
image. We will select the node called upscale image by and move it closer so we do not waste space on the canvas.
Then we connect it to the workflow and add a save image node at the end. So we
have a complete workflow. If we run this now we get the same image as before.
That happens because some nodes have default values that do not change anything. They only start doing
anything. They only start doing something once you change their parameters. In this case, the upscale
parameters. In this case, the upscale value is set to one. Scaling by one is like multiplying a number by one. You
get the same result. Now, let us increase the scale by value to two. When
we run it again, the image is upscaled by two times. So, we get double the resolution. This is similar to resizing
resolution. This is similar to resizing an image in Photoshop. In this case, it does not use AI to add new detail. It
simply enlarges the image. These upscale
methods are different ways of resizing an image. Nearest exact copies pixels
an image. Nearest exact copies pixels exactly, so it is very fast and keeps hard edges, but it can look blocky.
Bilinear smooths pixels together, giving a softer result that can look slightly blurry. Area is mainly meant for
blurry. Area is mainly meant for downscaling and is not ideal for upscaling images. By cubic uses more
upscaling images. By cubic uses more surrounding pixels to produce smoother and better looking results. Lancos
preserves detail and sharpness the best, but it is slower than the others.
Let us say you do a lot of changes to a node like titles, values, and colors, and you forget how the default values were. You can add the same node again
were. You can add the same node again and redo all the values and connections, or you can rightclick on the node and choose fix node, recreate. If you select that, the node goes back to its default
state. If you rightclick again, you also
state. If you rightclick again, you also have the option to clone the node and move that clone wherever you want. You
can also do this faster by holding alt and dragging the node. You can remove a node from this menu, but pressing the delete key is faster. We also have this
pin option. If you use it, a pin appears
pin option. If you use it, a pin appears at the top of the node. And now if you click and drag, the node does not move.
It is pinned to the canvas. To move it again, you need to rightclick and select unpin. Sometimes when you get workflows
unpin. Sometimes when you get workflows from the internet, some people stack many nodes on top of each other and pin them. It can look like there are only
them. It can look like there are only two nodes, even if there are 10 nodes behind one. I do not recommend doing
behind one. I do not recommend doing this. If you do not want people to use
this. If you do not want people to use your workflow, just do not share it.
We also saw that we need to change values on some nodes for them to work.
If we bypass a node, the workflow still runs and ignores that node. The links
are still there and the connection passes through the node. This is
important because there is another mode where the connection does not pass through. Let us enable the node again
through. Let us enable the node again using bypass. Now right click on it and
using bypass. Now right click on it and go to mode. You will see the option always which means the node is active.
There is also an option called never.
This mutes the node and makes it behave as if it does not exist. You can see that the node is now gray, not purple like bypass. When we run the workflow,
like bypass. When we run the workflow, we get an error. That is because the node is not passing anything through. So
the next node does not receive the image it expects. It is like cutting the cable
it expects. It is like cutting the cable where that node was. Let us remove that node and delete the link as well. If we
run the workflow again, the result is the same. The image is missing because
the same. The image is missing because there is no connection. Let us go to the menu then edit and click undo. You can
also use Ctrl + Z multiple times until you get back to the state you want. I
will undo until everything is active again. Let us zoom out with the mouse
again. Let us zoom out with the mouse wheel and add a group. Name your group in a way that explains what the nodes do. Do not name it something generic
do. Do not name it something generic like my group. You can move the group around and you will notice that it works like a magnet. If a node is inside the
group area, it stays inside. Let us make the group larger and move it around. So
you can see that nodes are sticking to it. Now adjust the group size and move
it. Now adjust the group size and move the nodes so it looks cleaner. Workflows
can get quite large, so I like to optimize the space to make them easier to read and navigate. Once the nodes are positioned, hold control and select all the nodes. You will see that only the
the nodes. You will see that only the nodes are selected, not the group. Now
rightclick and choose fit group to nodes. The group will resize to fit the
nodes. The group will resize to fit the nodes tightly. Now we can move the group
nodes tightly. Now we can move the group and it does not take much space. Groups
can have more options especially when using custom nodes like RG3. If we go to settings in Comfy UI, we have general settings, but we also have settings for
custom nodes. For example, for RG3, we
custom nodes. For example, for RG3, we have extra settings here. I can click this button to open them. RG3 is
installed when you use the easy installer.
If you install Comfy UI manually, you need to install RG3 from the manager. We
will talk more about custom nodes later.
You can also access the same RG3 settings directly from here, which is faster. If we scroll down, we have
faster. If we scroll down, we have settings for groups. For example, there is an option called show fast group toggles in group headers. Let us enable
it. You can choose when to show it
it. You can choose when to show it always or on hover. I will leave it on hover and save the settings.
Now when I hover over the group, you can see extra buttons in the top right. From
here, we can bypass all the nodes in the group easily. We also have an option to
group easily. We also have an option to mute the group. This is similar to setting nodes to never. When a group is muted, the nodes inside it do not run at all and the workflow behaves as if they
do not exist. Bypass still lets the workflow run through the nodes. Mute
does not. Let us make the group active again. We can also run the workflow
again. We can also run the workflow using the play button on the group. We
can change values, for example, use smaller values to get a smaller image, maybe half the size. There are many more things you can do with groups and switch style nodes, but we will cover those in
later episodes. I told you at the
later episodes. I told you at the beginning to leave the nodes to option turned off. At the moment of this
turned off. At the moment of this recording, it is still in beta and has some bugs. Maybe over time they will fix
some bugs. Maybe over time they will fix everything and it will become stable. If
I activate it, you can see that it changes how the nodes look. For most
nodes, things still work in a similar way, but this change exists. So, they
can add more functionality to nodes.
With the current system they use, there are limitations in what nodes can do.
And the new node version should give them more possibilities to build better nodes. Instead of the gray dot, you get
nodes. Instead of the gray dot, you get an arrow that points down and then to the right when the node is collapsed.
The inputs are placed on the edge of the node and some nodes have more options.
For example, in load image, you can see previews of images from the input folder. And you can also browse for
folder. And you can also browse for another image on your disk. However, for
older workflows, this can slightly change node sizes and mess up the layout. Some nodes might not work yet
layout. Some nodes might not work yet until the node creators update them.
Because of that, until everything is fixed and stable, I recommend leaving nodes 2 turned off. Just a quick reminder that from mode you can mute a node by setting it to never. This is
useful when a workflow is big and has a lot of branches. You can mute a branch of that workflow and it will still run without errors as long as there are no nodes after the muted ones that expect
an input. To turn the node back on, you
an input. To turn the node back on, you go to mode and choose always. We also
have shape options for nodes, but these are only decorative. They just change the corners of the node. By default, the corners are rounded. There is also the
card option which rounds only two corners. Personally, I do not think it
corners. Personally, I do not think it is worth spending much time on this. To
remove a group, you can select it and press the delete key or you can write click on it, choose edit group, and then remove. This only removes the group
remove. This only removes the group container, not the nodes inside it, unlike folders in other systems. Let us select these two nodes while holding control and press delete to remove them.
Then add an image crop node and connect the link to it. After that, let us add a preview image node. Since we are only testing with these settings, we get a
crop from the top left corner of the image. Let us arrange the nodes. Then
image. Let us arrange the nodes. Then
hold control, select both nodes, and press control + C to copy and then plus shift + V to paste them with the links.
Since we have them copied, let us paste again to get a third branch. Right now,
all the settings are the same. So, all
three give the same result. We can
change the x coordinate on one to get the bottom right corner of the image.
For another one, let us change the y-coordinate to get the bottom left corner. Now, when we run it, we split
corner. Now, when we run it, we split the image into three pieces. You could
add another one for the bottom right corner to get the missing part. That is
homework for you to figure out the correct coordinates. Now if we change
correct coordinates. Now if we change the input image and run it again, you can see how useful this can be. In a
later episode, we will learn how to load multiple images from a folder and automate this process so we can apply it to all images in a folder. Now select
all the nodes in the workflow using the shortcuttrl + a and then press delete to remove everything. It is time for
remove everything. It is time for another short break. This has been a long chapter and I want to make sure you have time to absorb the information.
Take a few minutes, press pause, get a drink, or step away from the screen, and then come back. Now, I do not know what learning method works best for you, but
I can tell you one method that usually works very well for video tutorials.
First, watch the entire tutorial from start to finish without stopping too much. This helps you build a general
much. This helps you build a general understanding of what is possible and how things fit together. Then watch it a second time, pause the video and follow
along step by step inside Comfy UI.
After that, try to repeat the same steps without the tutorial playing just from memory. Once you are comfortable, start
memory. Once you are comfortable, start experimenting.
Try changing nodes, parameters, or settings that were not covered in the tutorial. And if something does not work
tutorial. And if something does not work or you get stuck, that is completely normal. You can always go back to the
normal. You can always go back to the tutorial, rewatch a part or ask questions on Discord. Learning Comfy UI is not about speed. It is about understanding.
It is time to build a workflow from scratch. But first, let us go to
scratch. But first, let us go to workflows. Open the getting started
workflows. Open the getting started folder and open workflow number one, the one we used in a previous chapter. What
I want to do now is give you an analogy so you can understand what is happening here with all these nodes connected. So
it makes more sense. I will use a note node and add some info next to each node. You do not have to do that. Just
node. You do not have to do that. Just
watch and pay attention. I will open the same workflow but with those notes added next to each node and I will explain each one in detail. You probably noticed by now that when we generate images with
AI, we usually download a file called a model. Sometimes people call it a model
model. Sometimes people call it a model and sometimes they call it a checkpoint.
In practice, they usually mean the same thing. A model is the trained AI itself.
thing. A model is the trained AI itself.
It contains everything the AI learned during training like styles, shapes, and how images are formed. The word
checkpoint comes from machine learning.
During training, the model is saved at different points in time called checkpoints. Those checkpoints are what
checkpoints. Those checkpoints are what we download and use. So when you hear model or checkpoint, you can think of them as the same thing, the trained AI
file that does the image generation. In
Comfy UI, you will often see the term load checkpoint, but what you are really doing is loading the model you want to use. We can think of the model as the
use. We can think of the model as the photographer we want to hire. The load
checkpoint node is the step where we actually hire that photographer.
Depending on what the photographer learned during training, they will be good at different types of photos. That
is why there are so many different models available. Just like in real
models available. Just like in real life, some photographers specialize in portraits, others in landscapes, macro photography, or food photography. AI
models work in a very similar way. The
better and more complex the training of a photographer, the more expensive they usually are. In our case, that cost is
usually are. In our case, that cost is not money, but computer power. Larger
and more advanced models usually need more VRAM and a stronger graphics card to run properly. To keep things simple for now, we are hiring one photographer.
We will use a model called Juggernaut Reborn. And this is the photographer
Reborn. And this is the photographer that will generate our images. So now
that we hired the photographer, what comes next? We need to give instructions
comes next? We need to give instructions to that photographer about what we want to get and what we want to avoid. These
instructions are called prompts. We
usually use a positive prompt to describe what we want to see in the image and a negative prompt to describe what we want to avoid. In Comfy UI, we use the same node for both. I just
colored one green for the positive prompt and one red for the negative prompt so they are easier to recognize.
The node we use is called clip text encode. This node takes our written text
encode. This node takes our written text and translates it into a form that the model can understand. In simple terms, clip text encode acts like a translator
between human language and the AI. It
turns words into instructions that the photographer can follow during the photo shoot. Besides giving instructions on
shoot. Besides giving instructions on how the photo should look, we also need to decide how big the photo will be. For
that, we use the empty latent image node. This node is like choosing an
node. This node is like choosing an empty photo paper before taking the photo. Here is where we decide the width
photo. Here is where we decide the width and height of the image. We are defining the size of the photo before it even exists. At this stage, there is still no
exists. At this stage, there is still no image. It is just an empty space where
image. It is just an empty space where the photo will be created. Once the
photo shoot happens, the final image will always respect the size we set here. Now, it is time for the photo
here. Now, it is time for the photo shoot. The case sampler node is the
shoot. The case sampler node is the photo shoot itself. This is where the photographer follows the instructions from the prompts and uses the empty photo paper to take the photo. Each
different seed is like taking a new photo of the same scene. The idea is the same, but the result is slightly different every time. The K sampler
controls how the image is generated. It
decides how many steps the photographer takes, how much randomness is allowed, and how closely the final photo follows the instructions. You do not need to
the instructions. You do not need to understand every parameter right now.
What matters is that the K sampler is the core of the workflow where the actual image creation happens.
Everything before the K sampler prepares the photo shoot. Everything after it finishes the photo. After the photo shoot, the image is created, but it is
not visible yet. That is because the K sampler does not produce a normal image.
It produces something called a latent, which you can think of as a hidden version of the photo. It contains the information of the image, but it is not in a format we can actually view. This
is where VAE decode comes in. The VAE
decode node is like the dark room in photography. The photo already exists,
photography. The photo already exists, but it still needs to be developed to become visible. So, the VAE decode takes
become visible. So, the VAE decode takes that latent result and converts it into a real image that we can see, preview, and save. Without this node, the
and save. Without this node, the workflow can still generate something, but you would not be able to view the final photo because it is still in that hidden latent form. And finally, the
save image node is where the finished photo is delivered to the client. After
the VAE decode step, we usually add a node that either previews or saves the image. Preview nodes let us see the
image. Preview nodes let us see the result inside Comfy UI, while the save image node writes the final image to disk. Without one of these output nodes,
disk. Without one of these output nodes, the workflow has no final result. In our
photo studio analogy, this is the moment where the developed photo is either shown to the client or delivered as the final file. Now, let us zoom out and
final file. Now, let us zoom out and look at the entire process. First, we
load a model from our disk. This is like hiring a photographer. Then, we give instructions. The positive prompt
instructions. The positive prompt describes what we want. For example, a close-up portrait of a pet. The negative
prompt describes what we want to avoid.
For example, saying we do not want dogs.
Next, we decide how big the photo should be using the empty latent image node.
This is where we choose the size of the photo before it is taken. Now, let us run the workflow. You can see that all these instructions are passed into the K sampler where the image is actually
created. The K sampler is the photo
created. The K sampler is the photo shoot. It uses steps and different
shoot. It uses steps and different settings similar to camera settings like shutter speed or aperture to decide how the photo is taken. After that, the
image goes through VAE decode where it is converted from latent space into actual pixels. This is like developing
actual pixels. This is like developing the photo in a dark room and finally we save the image. This is when the photographer delivers the finished photo
to the client. Every image generation workflow in Comfy UI follows this same basic idea even when it becomes more complex. Let us do some quick
complex. Let us do some quick experiments. What happens if I change
experiments. What happens if I change the negative prompt the instructions where we say what we want to avoid. For
example, if I say I do not want a cat, it will probably give me another pet that is not a cat and we might get a dog instead. If we run it again, it is like
instead. If we run it again, it is like taking another photo of a pet because the seed is random. Now we can change the seed to be fixed. When the seed is
fixed, each time we use the same prompt, the same settings, and the same seed, we should get the exact same image. If I
try to run it again, you can see that nothing happens. The result would be the
nothing happens. The result would be the same. So, Comfy UI does not even bother
same. So, Comfy UI does not even bother to generate it again. If we change a setting like the seed, then it lets us generate again and we get a different image. If we go back to the previous
image. If we go back to the previous seed, we are back to the same image we had before generated with that seed. Now
that we kind of understand how it works, let us click on this plus sign and build the same workflow from scratch. Double
click on the canvas and search for load.
Usually it is either load checkpoint or load diffusion model. But in some cases there are special loaders for specific models. Now that we have the node, we
models. Now that we have the node, we select the model. Since we did not download more models yet, we only have one juggernaut reborn. So we hired our photographer. Now let us give it
photographer. Now let us give it instructions. Search for prompt. And we
instructions. Search for prompt. And we
can find this clip text end code node.
Let us move it next to the other node. I
like to change the color to green for the positive prompt. Right click and clone the node or just hold alt and drag the node to make a copy. For this second one, let us change the color to red.
Again, this does not influence how it works. It is the same node. It is just
works. It is the same node. It is just visual. For the positive prompt, I will
visual. For the positive prompt, I will add closeup portrait of a pet. For the
negative prompt, I will add cat. Not all
models use negative prompts. Some older
models like this one still use it, but you will see later that some newer models are smarter and do not need a negative prompt and they work better when the negative prompt is disabled.
Can you guess how these are connected?
We have clip on both input and output.
So, we can only connect clip to clip. If
we try to drag from the model, you can see it does not work. And the same if we try from the VAE. So, let us connect the clip output from the model to both of
the text encoders. Now we have the instructions for how the image should look but we still need to define the size. Let us search again using the word
size. Let us search again using the word empty and add the empty latent image node. There is also a newer one that we
node. There is also a newer one that we will use later for newer models but for this workflow we will use this simple one. I like to change the color of this
one. I like to change the color of this node to purple but you can leave it as it is if you want. Now we have width and height. Because we work with computers,
height. Because we work with computers, most models work better with values that are multiples of 64 or 8. That is why we see values like 512 instead of 500. I
know that this model was trained with square images at 512x 512 pixels. So I
use these values to get better results.
Some newer models are trained with larger images and can generate bigger images. But that comes at a cost. Just
images. But that comes at a cost. Just
like printing a big photo costs more than a small one, a bigger image takes more time to generate and sometimes your PC cannot handle it. More about that later. Now, let us add the most
later. Now, let us add the most important part where the magic happens, the K sampler. As you can see, this node has four inputs where it takes all the
instructions and one output. First, we
connect the model since it has the same color and name. Then, we connect the conditions. The instructions are yellow.
conditions. The instructions are yellow.
Even if the names are different, we connect the positive output to positive and the negative output to negative.
That is how it knows which one is positive and which one is negative even though they come from the same type of node. The last input is the empty latent
node. The last input is the empty latent image which defines the size of the image we want. Now we have everything needed to generate the image but it is
still in latent format. We need pixels to actually see it. So let us drag a link from the output and you can see that it suggests VAE decode. We select
it and now the image is decoded like a dark room where the photo is developed.
Here we also have a VAE input. In this
case the VAE model is included inside the main model which is why we can connect it directly. In some cases the VAE comes as a separate file and then we
use a load VAE node. You will see that later. Now the last step is to save the
later. Now the last step is to save the image. So we add the save image node.
image. So we add the save image node.
Let us run the workflow and see if it works or if we forgot something. If
everything turns green, it worked without errors. There are cases where
without errors. There are cases where the image does not look right. Even if
there are no errors, that usually means some settings are not ideal. People who
create AI models usually provide recommended settings, especially for the case sampler. Just like in photography,
case sampler. Just like in photography, macro and landscape use different camera settings. The same idea applies here. If
settings. The same idea applies here. If
we look at the previous workflow, we can see recommended settings for this model.
Steps 35, CFG7, this sampler, and thisuler.
So let us change steps to 35, CFG to 7.
For sampler, we use DPM++ 2M and foruler we use caris. Now let us run it again.
For this seed, we get some small deformations, but for next seed, it looks fine. We will see later how to
looks fine. We will see later how to improve the results even more. Let me
show you what happens when we try to generate an image that is much bigger than what the model was trained to handle. For this example, I will double
handle. For this example, I will double the image size. On the first try, I did not even get a pet. Sometimes you might get something that looks okay, but most
of the time you will see problems. you can get strange deformations, things that do not make sense, or visible mutations. If I increase the size even
mutations. If I increase the size even more, these problems become even more obvious. It also takes more processing
obvious. It also takes more processing power and more time to generate the image. The reason this happens is
image. The reason this happens is because this model was trained mainly on 512x 512 pixel images. When we ask it to generate a much larger image, it
struggles to understand the full space.
You can think of it like the model trying to generate the image in parts.
One part might look okay, but then it tries to continue the image next to it, almost like stitching pieces together, and that is where things break. That is
why you sometimes see double heads, repeated objects, or strange structures in large images. Bigger images are not always better if the model was not trained for that size. But if a model is
trained to handle larger images, you can get more details and better results. Let
us say I add ugly to the negative prompt. So we push the result toward
prompt. So we push the result toward more beautiful images. For the positive prompt, let us be more specific. We want
a dog and we want it to be beautiful.
Now when we run it, we get a more beautiful dog. Because this model is
beautiful dog. Because this model is really old, like I told you, it is good for practice. But today, we have much
for practice. But today, we have much bigger and more accurate models. They
produce better results with fewer deformationations, but they are larger and need more VRAMm to run properly. Our
desktop computers are very similar to Comfy UI because they are both built around the idea of connecting specialized components together where each one does a specific job. The CPU
acts like the central processor just like the sampler or the model does the main work in Comfy UI. The monitor is like preview and output nodes that show results. The keyboard and mouse are
results. The keyboard and mouse are inputs just like prompts and parameters.
Printers and speakers are output devices like save image or audio nodes. Routers
handle communication similar to data links between nodes. The reason we design systems this way is because breaking complex tasks into smaller
connected parts makes them easier to understand, easier to control, easier to upgrade, and more flexible. That is
exactly why Comfy UI uses nodes instead of hiding everything behind a single button. Now that we know how to create a
button. Now that we know how to create a workflow, we also need to learn how to save it. If you look at the top, you can
save it. If you look at the top, you can see it says unsaved workflow. That means
none of these settings or nodes are stored yet. If you want to reuse the
stored yet. If you want to reuse the same workflow later without recreating everything from scratch, you need to save it. If I click on this arrow next
save it. If I click on this arrow next to the workflow name, you can see there are several save options. Personally, I
prefer using the main menu. So, I go to file and here we have save, save as, and export. When you click save and the
export. When you click save and the workflow has never been saved before, Comfy UI will ask you to give it a name and choose where to save it. If the
workflow was already saved and you just made changes, clicking save will overwrite the existing file with the same name. Save as lets you save the
same name. Save as lets you save the same workflow under a different name.
This is the option I use the most, especially when I want to create variations of a workflow. Export is very useful because it is not limited to the Comfy UI workflow folder. It allows you
to save the workflow anywhere on your computer, even outside the Comfy UI folder. The API option is mainly used
folder. The API option is mainly used when working with online or cloud-based workflows. So, we will not use it here.
workflows. So, we will not use it here.
So, let us click export. Now, it asks for a name. Choose a name that makes sense to you. Click confirm. Then,
choose where to save it. For example, I can save it on my desktop. You can see that the file is saved with theJSON extension. This JSON file contains all
extension. This JSON file contains all the nodes, connections, and settings of your workflow. This file is your
your workflow. This file is your workflow, and you can open it anytime, share it with others, or modify it later. JSON files are simple text files.
later. JSON files are simple text files.
You can open them with any text editor like Notepad. JSON stands for JavaScript
like Notepad. JSON stands for JavaScript object notation, and it is just a structured way of writing text so both humans and computers can read it. In
Comfy UI, the JSON file stores things like node types, connections, parameters, and settings, all written as text. That is why workflow files are
text. That is why workflow files are small in size and easy to share. They do
not include images or models, only instructions. If we go to workflows, you
instructions. If we go to workflows, you can see I have that folder with workflows saved there. You can do that, too.
Go to the menu. Go to file. Choose save
as. Give it a name and confirm. Now if we go to workflows,
and confirm. Now if we go to workflows, we can see the workflow is saved there.
Right now it is not organized into any folder. It is just in the main list. But
folder. It is just in the main list. But
you can add a folder name in front of the workflow name when you save it. For
example, folder name, then a forward slash, then the workflow name. Let us
see where it is saved. Go to your Comfy UI folder. Then inside the Comfy UI
UI folder. Then inside the Comfy UI folder, go to user, then default, then workflows. Here you
can see your saved workflow and also the folder I created for this course that comes with the easy installer. You can
create your own folder manually. For
example, I can create a folder called my workflows, then drag that workflow into it. Now, if we go back to Comfy UI,
it. Now, if we go back to Comfy UI, nothing changes immediately because Comfy UI usually reads this when it starts. But we can refresh using this
starts. But we can refresh using this refresh button. Now our folder appears
refresh button. Now our folder appears there and we can see the workflow inside it. I suggest organizing your workflows
it. I suggest organizing your workflows like this because over time you will have a lot of workflows and it becomes hard to keep track of everything. By the
way, you can also use the search bar to search for a workflow by name. We also
have a bookmark icon. If we click it, the workflow is added to the bookmarks at the top. So the ones you use the most stay there. If you click the bookmark
stay there. If you click the bookmark again, it is removed from the favorites list. Let us collapse this and I will
list. Let us collapse this and I will show you one more thing. If we go to the desktop and open the shortcut for the output folder or if we go directly to the output folder, you can see all the
images generated so far with Comfy UI.
The last one is this dog. You probably
did not think about this yet, but if you open an image generated with Comfy UI in Notepad, you can actually see some code at the beginning. Just like with workflows, this happens because Comfy UI
attaches the workflow to the image when it saves it. After that, there is the image data which we cannot really read.
This means that every image has the full workflow embedded in it, including all the settings and prompts. Let me drag this image onto the Comfy UI canvas so you can see what happens. Now you can
see that it loads as a workflow with the file name. If we generate again, we get
file name. If we generate again, we get exactly the same image because it uses the same seed and settings. Let us go back to the output folder and drag a different image. For example, this
different image. For example, this robot. Now it loads that workflow. And
robot. Now it loads that workflow. And
if we run it, we get the exact same robot. This is very useful. Another
robot. This is very useful. Another
thing you might notice is that all images start with the word comfy UI followed by a number. This happens
because in the save image node, the prefix is set to that value. We can
change it. For example, I can set it to pixa. And now when I run the workflow,
pixa. And now when I run the workflow, the image file name will start with that word followed by a number. As you can see here, if you hover over the prefix field, you can get more information
about how to format it. You can include things like the date and other values in the file name. Now, let us change it again. I will add a folder name. for
again. I will add a folder name. for
example, my images, then a forward slash, then the image prefix.
When we run the workflow now, the images will be saved inside that folder. Let us
go to the output folder. You can see we now have a folder called my images. And
inside it, we have the images that start with the prefix we set followed by a number. Now, I will go back to the
number. Now, I will go back to the workflows folder that we created earlier and delete it.
Back in Comfy UI, if we refresh the workflows list, you can see that the folder is gone. We are left only with the getting started folder we used for this episode. When you create your own
this episode. When you create your own folders and organize your workflows, I suggest naming them in a way that makes sense. You can name them by base model
sense. You can name them by base model like SDXL workflows or flux workflows or by function like text to image workflows, inpainting workflows or video
workflows. Choose whatever makes the
workflows. Choose whatever makes the most sense to you, but organizing your workflows early will save you a lot of time later.
In this chapter, I want to show you how Comfy UI is organized on your disk. This
is important because sooner or later, you will need to know where to place models, images, workflows, and custom nodes. Do not worry if this looks
nodes. Do not worry if this looks overwhelming at first. You do not need to understand everything right now. I
will focus only on the folders you actually need as a user. This is the main Comfy UI easy install folder. Think
of this as the main workspace that contains everything Comfy UI needs to run. The most important things here are
run. The most important things here are the Comfy UI folder, the Python embedded folder, the add-ons folder, and the batch files used to start or update
Comfy UI. In normal usage, you will
Comfy UI. In normal usage, you will mostly work inside the Comfy UI folder.
If you have a different version of Comfy UI, you will not have the add-ons folder and some of the BAT files will be named differently, but pretty much everything else should be similar. When we open the
Comfy UI folder, we see many files and folders. Most of these are internal
folders. Most of these are internal files used by Comfy UI itself. As a
beginner, you do not need to touch most of these. The important folders for us
of these. The important folders for us are models, input, output, custom nodes, and the user folder. The models folder
is where all AI models live. This
includes checkpoints, Laura files, VAEs, control nets, upscalers, and more.
Inside the models folder, everything is organized by type. For example,
checkpoints or diffusion models for main image generation models, loris for Laura files, V for VAE files, control net for
control net models. When a workflow tells you to download a model, it will also tell you exactly which subfolder to place it in. If a model is not placed in the correct folder, Comfy UI will not
see it. The input folder is where you
see it. The input folder is where you place images that you want to load into Comfy UI. For example, images used for
Comfy UI. For example, images used for image to image control net masks or reference images. Any image you place
reference images. Any image you place here will be visible inside Comfy UI when using a load image node. The output
folder is where Comfy UI saves generated images by default. Every time you use a save image node, the result will appear here. This makes it very easy to find
here. This makes it very easy to find all your generated images in one place.
The custom nodes folder is where all custom nodes are installed. These are
extra features added by the community.
Each folder here represents a custom node package. For example, we already
node package. For example, we already used the RG3 node and we will use more later. When you install nodes using the
later. When you install nodes using the manager, they usually end up here automatically. If a custom node is
automatically. If a custom node is missing or broken, this is usually the first folder you should check. Inside
the user folder, we have user specific data. The most important part for us is
data. The most important part for us is the workflows folder. This is where Comfy UI stores workflows that you save from inside the interface. These
workflow files are saved as JSON files.
The add-ons folder is specific to the easy install version. It contains extra tools, optimizations, and helper scripts. You usually do not need to
scripts. You usually do not need to touch this folder unless a tutorial specifically mentions it. You do not need to memorize this right now, but this structure might change as new tools
are created by IVO. For example, this BAT file lets you link a folder with models from another Comfy UI installation. This one installs the Naku
installation. This one installs the Naku node and this one installs Sage Attention. There are also different
Attention. There are also different torch pack versions for more advanced users who need a specific version for certain custom nodes. You will also find
extra tools like one for Windows 10 that enables long paths so Comfy UI can download models even if the path is very long. There is also an update folder
long. There is also an update folder with BAT files, but as you will see later, for the easy install version, I recommend using different update BAT files. The Python embedded folder
files. The Python embedded folder includes a self-contained Python installation. This helps avoid conflicts
installation. This helps avoid conflicts with other software and makes Comfy UI easier to run and update. As you use Comfy UI more, this folder structure
will start to make sense naturally. In
the next chapters, I will always tell you exactly where things need to go. Let
us talk a little bit about updates and custom nodes. What you are seeing here
custom nodes. What you are seeing here is the Comfy UI easy install folder.
This setup already includes everything needed to update Comfy UI safely. The
most important rule is this. Always
close Comfy UI before updating. Never
update while it is running in the browser. Start Comfy UI BAT. This only
browser. Start Comfy UI BAT. This only
launches Comfy UI. It does not update anything. Update Comfy UI BAT. This
anything. Update Comfy UI BAT. This
updates the core Comfy UI code. Use this
when you want the latest features or fixes. Update Comfy UI and Nodesbat.
fixes. Update Comfy UI and Nodesbat.
This updates Comfy UI and all installed custom nodes. Update easy install.bat.
custom nodes. Update easy install.bat.
This updates the easy install system itself. When should you update? Update
itself. When should you update? Update
when something is broken. Update when a node requires a newer version. Update
when you want new features. Do not
update right before an important project. Updates can sometimes break
project. Updates can sometimes break workflows. If something breaks after an
workflows. If something breaks after an update, you can usually fix it by updating again or removing the last custom node you installed. One important
reminder, Comfy UI moves fast. Stability
comes from not updating every single day. If everything works, it is okay to
day. If everything works, it is okay to stay on your current version. At some
point, you will mess up Comfy UI. Maybe
a node breaks or some dependencies get messed up or an update has bugs. But
remember, you can always do a fresh install when that happens. Just create a new folder and reinstall using the easy installer. Let us doubleclick on update
installer. Let us doubleclick on update easy install. This updates only the easy
easy install. This updates only the easy installer and adds extra tools and add-ons. As we move forward in this
add-ons. As we move forward in this series, more models will appear, new nodes will be added, and IVO likes to create scripts that make these installations easier. When you see that
installations easier. When you see that the installation is complete, you can read more about the new release using this link or press any key to exit. You
may not see any changes immediately, but if we go to the add-ons folder, you can see that we now have more BAT files than we had in the first chapter. Now, let us go back to the main folder and try to
update Comfy UI to see if everything still works or if we break some nodes.
Nodes sometimes break after an update because Comfy UI itself changes how things work internally. Many custom
nodes are made by independent developers, not by the Comfy UI team.
These custom nodes often rely on specific Comfy UI behavior, internal APIs, or extra Python libraries and dependencies. When Comfy UI updates,
dependencies. When Comfy UI updates, those assumptions can change and the node stops working until its creator updates it to match the new version. So,
Comfy UI started after the update, but let us open the command window to see if everything worked correctly. Usually,
after startup, you can see import times for custom nodes. As you saw before, all custom nodes are inside the custom nodes folder. But look what happened here.
folder. But look what happened here.
After the update, one of the custom nodes installed like TC failed to import. That means if you have a
import. That means if you have a workflow that uses that node, it will not work. If you do not use that node,
not work. If you do not use that node, you can ignore it and try updating Comfy UI again in a few days to see if it gets fixed. I will close Comfy UI now and try
fixed. I will close Comfy UI now and try something else. Sometimes there are
something else. Sometimes there are newer versions of the custom nodes and if the author fixed the issue, updating the nodes can fix the problem. So this
BAT file updates only comfy UI and this one updates both Comfy UI and the custom nodes. This process can take a while
nodes. This process can take a while because it updates all the nodes. So I
will speed it up. Comfy UI started. So
let us check the command window to see if the issue was fixed. The node was still not fixed. This means that at the time I recorded this video, the update from that day broke that node. When you
watch this video, it might already be fixed and work for you. Either because
Comfy UI fixed a bug, the node creator patched the node, or a new developer created a replacement node. There is one more thing I want to try. We can go back to an older version of Comfy UI that did
work, a version that had the right conditions for that node. The downside
is that if Comfy UI released new features or nodes for newer models, those might not work on the older version. So, it is always a compromise.
version. So, it is always a compromise.
You have to choose between keeping a specific custom node working or using the latest Comfy UI updates. IVO hid
this option so beginners do not accidentally mess up their Comfy UI. Let
us go to the add-ons folder, then to tools, and here we have the version switcher. When we run this BAT file,
switcher. When we run this BAT file, Comfy UI is downgraded to a previous version. In my case, it went from
version. In my case, it went from version 0.7 back to version 0.6.
If you run this script again, it upgrades Comfy UI back to the latest master branch. Let us press any key to
master branch. Let us press any key to close this. Now that we are on an older
close this. Now that we are on an older version, it is time to check if that node works. Let us start Comfy UI. Wait
node works. Let us start Comfy UI. Wait
for the interface to load. Then open the command window and check the custom nodes. Now it is fixed and there are no
nodes. Now it is fixed and there are no errors with the nodes. In a few days I will try updating again to see if it gets fixed in the newer version. But
this is basically how you update and downgrade Comfy UI using the easy installer. Other Comfy UI versions might
installer. Other Comfy UI versions might require you to run commands manually, but I keep pushing IVO to create BAT scripts for these tasks. I want to spend my time generating, not typing lines of
code. You will see that we have the
code. You will see that we have the manager here. In other versions, you
manager here. In other versions, you might find it somewhere else in the menu. Let us open the manager and see
menu. Let us open the manager and see what we have here. We also have update and update comfy UI. These are similar to the BAT files, but the BAT files have
something extra. They take into account
something extra. They take into account some dependencies needed for certain custom nodes to work, which Comfy UI itself does not handle when updating.
For example, for the nunchaku node to work, it needs specific dependencies, like a certain version of a library. The
BAT file updates Comfy UI, but then adjusts or downgrades those dependencies to the versions required by the custom nodes we use. IVO tries to maintain these BAT files and keep them updated so
they stay compatible with the versions needed to run the workflows shown in these video tutorials. Because I am using the easy installer, I did not touch these update buttons inside the
manager. I only use the BAT files. If
manager. I only use the BAT files. If
you have a different version of Comfy UI, you will need to use these update options or use a BAT file from the update folder instead. In the manager, you can also find the latest Comfy UI
news, such as what was fixed, what is new, and recent changes. At the bottom, you can see the Comfy UI version and the manager version. Most of the time the
manager version. Most of the time the manager is used for managing custom nodes. If we go to the custom nodes
nodes. If we go to the custom nodes manager, we can see all the available custom nodes created by different developers. There are a lot of them. I
developers. There are a lot of them. I
personally try to keep the number of installed nodes to a minimum and install only what is essential or what I use most often. Some people install hundreds
most often. Some people install hundreds of nodes, but the more nodes you install, the harder it becomes to keep everything compatible because each node can have its own dependencies and
requirements. If I filter by installed,
requirements. If I filter by installed, you will usually not see many nodes here besides the manager itself. However, I
asked Ivo to include a few essential nodes that I use most often. One example
is the RG3 custom node, which includes the image compar node that is very useful for comparing images side by side. Each custom node has a title and a
side. Each custom node has a title and a version number. You can switch versions
version number. You can switch versions if needed, for example, when an older workflow only works with a specific version of a node. For each node, you
also have several actions available.
Update only that node, switch the version, temporarily disable it, or uninstall it. You can also see how many
uninstall it. You can also see how many individual nodes are included in that custom node package along with a short description. Some nodes mention possible
description. Some nodes mention possible conflicts with other nodes. If you click on that yellow warning text, you can read more details about those conflicts.
These conflicts usually matter only if you use both conflicting nodes in the same workflow. You can also see the
same workflow. You can also see the author of the node and the number of stars it has on GitHub. Stars are given by users and usually indicate how popular or trusted a project is. Some
developers are wellknown and consistently release highquality nodes.
That said, there have been cases in the past where certain nodes had security issues, so it is still a good idea to be careful. You can also see when the node
careful. You can also see when the node was last updated. To switch versions, you click the version selector, choose a version from the list, click select, and
then follow the steps shown. We will not do that right now. As you remember, every custom node that gets installed ends up in the custom nodes folder. Here
you can see all the custom nodes that come with the easy install version at the time of this recording. Now, let us install one node as a test just to see
how the process works. Open the manager.
Go to custom nodes manager and search for a node called align. We will use this as a test because it does not require special dependencies. So in
theory it should not affect Comfy UI too much. Each node entry has a title. If
much. Each node entry has a title. If
you click on it, it opens the GitHub page for that node. On GitHub, you can see the code because every custom node is basically Python code and supporting
files. You can also check the issues tab
files. You can also check the issues tab where users report problems and sometimes solutions are discussed. If
you scroll down, you usually find important information like required Comfy UI versions, Python versions, or other dependencies. These are the
other dependencies. These are the dependencies I mentioned earlier, things the developer relied on when creating the node. You also see installation
the node. You also see installation instructions either through the manager which we are doing now or manually using commands like git clone which simply copies the code into the custom nodes
folder. Before installing any custom
folder. Before installing any custom node, it is a good habit to read this information. Some nodes require things
information. Some nodes require things your system might not have and then they will not work. Now let us install this node. Click the install button. You will
node. Click the install button. You will
be asked to choose a version. So select
the latest version. The button changes and installation begins. When it
finishes, you will see a restart button.
Comfy UI needs to restart for the node to become available. Click restart and confirm. Comfy UI shuts down. You will
confirm. Comfy UI shuts down. You will
see the browser trying to reconnect while Comfy is restarting. After a few moments, you get a confirmation message.
Click confirm. The node is now installed. If you go back to the
installed. If you go back to the manager, open custom nodes manager and search for the align node, you will see that it now shows an uninstall button.
If installation had failed, you would see an import failed message instead. If
you look inside the custom nodes folder on disk, you will now see a new folder for this node. It is simply the same code you saw on GitHub copied locally.
This code is what adds new nodes to the Comfy UI interface. If you deleted this folder manually, that would also uninstall the node. However, let us
uninstall it properly using the manager.
Go back to the manager, click uninstall and confirm again. You will be asked to restart Comfy UI. Confirm. Wait for the
restart and then confirm the browser reload.
Now, go back to the custom nodes manager and search for the align node again. You
will see the install button again which means the node is no longer installed.
If you check the custom nodes folder, you will also see that the folder for this node has been removed. This is the basic workflow for installing, updating, and uninstalling custom nodes using the
manager. Sometimes when you download a
manager. Sometimes when you download a workflow from other people on the internet, you will have missing nodes because they used custom nodes that you do not have installed. When you do not
know what nodes they used, you can use the install missing custom nodes button.
This will give you a list of missing nodes and the option to install them.
That said, I personally prefer to install nodes manually so I have full control over what gets installed. That
is why I usually include a note node in my workflows explaining exactly which custom nodes are required. Now let us look at templates. If we open templates,
we can see different workflows created by the Comfy UI team. If we filter by image generation workflows and select something like a Z image turbo text to
image workflow, Comfy UI will first tell us that we have missing models. These
are the AI models required for the workflow to generate images. Usually, it
tells you exactly which folder the model needs to go into and gives you the model name along with a download link or a download button. In this example, you
download button. In this example, you can see it needs a VAE model and a few other models. Once you download those
other models. Once you download those models and place them in the correct folders, the workflow should work, assuming you have enough VRAM to run it.
In this case, there are no missing nodes. So, let us close this. Now, let
nodes. So, let us close this. Now, let
us go to menu, then file, then open, and open a workflow that I know uses missing custom nodes. You will see a message
custom nodes. You will see a message saying the workflow uses custom nodes that are not installed. At first, you might not see any red nodes on the canvas. That is because this workflow
canvas. That is because this workflow uses subgraphs. Subgraphs are basically
uses subgraphs. Subgraphs are basically nodes that contain other nodes inside them. If you have experience with
them. If you have experience with Photoshop, you can think of them like smart objects. When you see an icon with
smart objects. When you see an icon with a square and an arrow, you can click it to enter the subgraph. Once inside, you can see the red node that is missing. If
we now open the manager and click install missing custom nodes, Comfy UI detects that node and offers to install it. For many nodes, this works
it. For many nodes, this works perfectly. However, some nodes like
perfectly. However, some nodes like Nunchaku require additional dependencies and extra setup. We will talk about those in a future episode. The important
thing to know is that for many workflows, install missing custom nodes can quickly fix the problem. Let us
close this for now. If we open the manager again, you will also see a models manager. This lets you browse and
models manager. This lets you browse and download models by type. Personally, I
rarely use this because a model without a workflow is not very useful. In my
tutorials and on my Discord server, every workflow comes with notes explaining exactly which models you need and where to put them. The Comfy UI templates also clearly list required
models and folders. So, let us do a quick recap.
Use update Comfy UI.bat to update only Comfy UI. Use update Comfy UI and
Comfy UI. Use update Comfy UI and nodes.bat to update Comfy UI and all
nodes.bat to update Comfy UI and all custom nodes. Use update easyinstall.bat
custom nodes. Use update easyinstall.bat
to update the easy install system and helper scripts. The update folder exists
helper scripts. The update folder exists for users with other Comfy UI versions.
The add-ons folder only exists in the easy install version. Inside add-ons,
the tools folder includes the version switcher, which lets you downgrade or upgrade Comfy UI if needed. This is
useful when a new update breaks a node you rely on. Inside the Comfy UI folder, the custom nodes folder contains all installed custom nodes. If you delete a
folder from here, you uninstall that node. Sometimes if a node fails to
node. Sometimes if a node fails to install correctly, deleting its folder and reinstalling can fix the issue. I
know this is a lot of information. Do
not worry if it does not all stick right away. Practice, experiment, and come
away. Practice, experiment, and come back to this tutorial in a month. You
will be surprised how many things suddenly make sense that you missed the first time. Regarding the tcash node,
first time. Regarding the tcash node, after a few days, Comfy UI was updated again and the problem was still not fixed. There is now version 8 and even
fixed. There is now version 8 and even if you downgrade to version 7, it is still not fixed. Comfy UI keeps adding updates and at some point some custom
nodes will stop working. If that node is not important for you, you can delete it or uninstall it. You can also just disable it from the manager or drag the TCH folder into the disabled folder so
it is disabled. You can move it back out of the disabled folder anytime you want to try it again. In this chapter, I will try to simplify this complex world of
diffusion and AI a little. Do not worry if you do not understand everything that is happening. Like I said before, you do
is happening. Like I said before, you do not have to be a mechanic and know all the engine parts to know how to drive a car. This is the core idea behind
car. This is the core idea behind diffusion image generation. The model
does not draw an image all at once. It
starts from pure random noise. This
noise looks like static on a television.
The model then runs a sequence of small refinement steps. At each step, a small
refinement steps. At each step, a small amount of noise is removed. Early steps
reveal very rough shapes. Later steps
reveal clearer forms. Final steps add fine details and texture. Image
generation is therefore a gradual process. It goes from noise to less
process. It goes from noise to less noise to recognizable shapes and finally to a finished image. This slide is a simplified visualization. The real
simplified visualization. The real process is more complex. In practice,
most diffusion models work in a compressed latent space rather than directly on pixels. A neural network predicts what noise should be removed at each step. Even though the real math is
each step. Even though the real math is more advanced, this simplified view is enough to understand how diffusion works. It's like sculpting. You start
works. It's like sculpting. You start
with a rough block and remove material until the shape appears. or like a foggy window clearing up step by step. You
don't instantly get a sharp scene. It
resolves gradually. Let's open comfy UI.
Go to workflows and from the getting started folder, pick workflow 1, which is the basic text to image example. Even
if we cannot fully see what is happening inside the K sampler stepby step, we can still get a good idea of the overall process. Remember what we see here is a
process. Remember what we see here is a simplified representation of what is actually happening under the hood. First
we want a fixed seed. We will see later that each seed starts with different noise. Right now we are using 35 steps
noise. Right now we are using 35 steps which is enough for this model to produce a clear image like this robot.
If we change the steps to one, you can see that the model does not have enough time to remove the noise. So the image is very unclear with these settings. If
we add another step, the change is subtle. Adding another one, you can
subtle. Adding another one, you can start to see something forming. By step
four, we can almost see a face. We can
automate this process to see the changes faster. Double click on the canvas and
faster. Double click on the canvas and add a primitive node. Like you saw in an earlier chapter, we can adapt this node for different fields. Drag a connection from the primitive node and connect it
to steps. Now we have control over the
to steps. Now we have control over the steps including what happens after each generation. Instead of fixed or random,
generation. Instead of fixed or random, choose increment. After each run, the
choose increment. After each run, the value increases by one. So now we have five steps. If we run it again, we get
five steps. If we run it again, we get six steps and the image starts to change more. As more steps are added, more
more. As more steps are added, more noise is removed and the image becomes clearer. Next to the run button, there
clearer. Next to the run button, there is a small down arrow. From here, select run instant. This means we can click run
run instant. This means we can click run once and it will keep running until we stop it. You can see the workflow now
stop it. You can see the workflow now runs automatically. On each run, more
runs automatically. On each run, more steps are added and the image keeps refining. You may also notice that as
refining. You may also notice that as the number of steps increases, it becomes harder for the computer. Just
like climbing many stairs, more steps mean more effort. So, generation becomes slower and slower. Soon we reach around 35 steps which is recommended for this
model to get a nice clear image.
Although some results already look good around 20 steps. Now we want to stop this. Click the arrow again and switch
this. Click the arrow again and switch back to run. After the current generation finishes, it will stop. There
is also another way to see a small preview of what is happening inside the K sampler. From the menu, you can go to
K sampler. From the menu, you can go to settings, but it is faster to access the settings from here. In the settings search bar, type preview. You will see
an option called live preview method. By
default, it does not show anything. But
if we set it to auto, we can see a small preview during generation. Let's delete
the primitive node. Then change the seed to random. Now when we run the workflow,
to random. Now when we run the workflow, we can see a small preview of what the image might look like before it even finishes generating. Let us change the
finishes generating. Let us change the steps to 30 and run again. You can now quickly see what is happening in the diffusion process. Even though this
diffusion process. Even though this preview is low resolution, you can clearly see how the image becomes more and more defined as noise is removed.
Now, let me try something more drastic.
I will use a very large image size. On
some computers, this might crash Comfy UI or take a very long time to generate.
I will run it again with these settings.
You can see that generation is now very slow. But the preview lets us observe
slow. But the preview lets us observe how the image slowly starts to appear.
This is a bit too slow. So I will cancel the generation here. Instead, I will try a slightly smaller image, still larger than what the model is comfortable with, just so we can see the preview updating
more slowly. Now we can clearly see the
more slowly. Now we can clearly see the diffusion process updating every few seconds. The speed of this preview is
seconds. The speed of this preview is also influenced by the sampler and the scheduler. As you may remember, models
scheduler. As you may remember, models are trained on specific image sizes. If
a model was not trained on large images, it treats them more like multiple smaller images stitched together. For
example, our juggernaut model was trained on 512 pixel images only.
Personally, I prefer not to keep the live preview enabled all the time because it can slightly slow down generation. So I will go back to
generation. So I will go back to settings and set the preview option back to default. I will also reset the image
to default. I will also reset the image width and height. You may notice that the preview is still visible. This can
happen because something remains in memory. To fix this, I will press F5 to
memory. To fix this, I will press F5 to refresh the browser. Keep in mind that refreshing the browser will reload only the current workflow. If you had other
workflows open and did not save them, they will be lost. Now everything is back to normal without the preview.
There are still more useful things to learn. This slide explains how a
learn. This slide explains how a diffusion model is trained. This is not image generation yet. During training,
the model is shown millions of images paired with text descriptions. For
example, images of cats, people, objects, lighting styles, and environments. The training process uses
environments. The training process uses something called forward diffusion.
Forward diffusion means gradually adding noise to a clean image. At first, only a small amount of noise is added. Then
more noise is added step by step.
Eventually, the image becomes almost pure noise. At each step, the model is
pure noise. At each step, the model is trained to predict what noise was added.
In other words, it learns how images break down as noise increases. By
repeating this process across millions of images, the model learns patterns. It
learns what shapes look like. It learns
what objects look like. It learns how lighting and structure behave. The goal
of training is not to memorize images.
The goal is to learn how to reverse this process later. Training a diffusion
process later. Training a diffusion model requires massive data sets and powerful hardware. In Comfy UI, we are
powerful hardware. In Comfy UI, we are only using the result of that training.
Now that the model has learned how noise works during training, we can use that knowledge in reverse to generate images.
This slide shows the difference between training and image generation. During
training, the model starts with a clean image. Noise is added step by step until
image. Noise is added step by step until the image becomes pure noise. This is
called forward diffusion. This process
teaches the model how images break down when noise is added. During generation,
the process is reversed. We start from pure random noise. The model removes noise step by step to create an image.
It is important to understand this clearly. During generation, we do not
clearly. During generation, we do not add noise like in training. We only
remove noise using what the model learned before. This slide explains an
learned before. This slide explains an important concept that is often misunderstood. The model does not store
misunderstood. The model does not store images in memory. During training, the model never saves photos that it has seen. Instead, it learns patterns and
seen. Instead, it learns patterns and relationships.
It learns what shapes look like. It
learns what objects look like. It learns
how parts of an image relate to each other. For example, it learns that faces
other. For example, it learns that faces usually have eyes in a certain position.
It learns that animals have specific structures. It learns how lighting,
structures. It learns how lighting, shadows, and perspective usually behave.
All of this knowledge is stored as probabilities inside the model, not as pictures, but as learned rules. You can
think of it like learning a language.
You do not memorize every sentence you read. You learn grammar and structure.
read. You learn grammar and structure.
The model works the same way. It learns
visual grammar, not individual images.
When the model generates an image, it is not copying anything it has seen before.
It is using learned patterns to guide the noise removal process. That is why results can look familiar but are still new images. This is why changing the
new images. This is why changing the prompt changes the result. The prompt
activates different learned patterns inside the model. That is also why the same model can generate many different images even though it was trained only
once. So far we talked about diffusion
once. So far we talked about diffusion in a simplified way as if it happens directly on images. In reality, most modern diffusion models do not work
directly on pixel images. Instead, they
work in something called latent space.
Pixel space is the image as we normally see it. It is made of pixels with width,
see it. It is made of pixels with width, height, and color values. Latent space
is a compressed representation of that image. It keeps the important structure
image. It keeps the important structure and information but removes unnecessary detail. You can think of latent space as
detail. You can think of latent space as a simplified version of the image that is easier for the model to work with. To
move between pixel space and latent space, the model uses a VAE. VAE stands
for variational autoenccoder. The VAE
has two main jobs. First, it encodes a pixel image into latent space. Second,
it decodes a latent image back into pixels. During image generation,
pixels. During image generation, diffusion happens in latent space. After
the dnoising process is finished, the VAE decodes the result back into a visible image. Working in latent space
visible image. Working in latent space makes diffusion much faster. It also
uses less memory and less computing power. This is why models like stable
power. This is why models like stable diffusion can run on consumer graphics cards. Without latent space, image
cards. Without latent space, image generation would be much slower and more expensive. In Comfy UI, this is why we
expensive. In Comfy UI, this is why we see nodes like VAE encode and VAE decode. When we generate images from
decode. When we generate images from text, the model works in latent space and VAE decode converts the result into pixels we can see and save. This also
explains why image resolution and VAE selection can affect results.
Now we look at how text prompts influence image generation. The prompt
does not act only once at the beginning.
During diffusion, the prompt is used at every denoising step. At each step, the model checks whether the image is moving closer to what the text describes. You
can think of the prompt as guidance. It
gently nudges the image in the right direction while noise is being removed.
This happens repeatedly, step by step, until the final image is formed. CFG
stands for classifier free guidance. CFG
controls how strongly the prompt influences the dnoising process. With a
low CFG value, the model follows the prompt loosely and allows more randomness. With a high CFG value, the
randomness. With a high CFG value, the model follows the prompt more strictly and forces the image to match the text more closely. Here is a quick example.
more closely. Here is a quick example.
You can find CFG here in the K sampler.
Too low CFG can produce images that ignore the prompt. Too high CFG can produce images that look unnatural or oversharpened. CFG is like telling the
oversharpened. CFG is like telling the model how strict it should be about your instructions. The prompt does not
instructions. The prompt does not generate the image by itself. The prompt
only guides the noise removal process.
The image is still created by diffusion in latent space. As you can see with CFG1, the cat is still a cat, but it is not read like we asked. With CFG7, the
result is much closer to the prompt.
That said, this also depends on the model we are using. Smarter or better trained models tend to follow the prompt more accurately. In fact, there are some
more accurately. In fact, there are some models where we intentionally use a fixed CFG value of one, which effectively ignores the negative prompt.
However, pushing CFG too high can damage the image. It can introduce artifacts or
the image. It can introduce artifacts or make the result look unnatural. Because
of that, we always try to find a balance. The goal is to use settings
balance. The goal is to use settings that give us the quality we want in the shortest amount of time without hurting the final image.
Now we talk about seeds. Seeds are very important for understanding consistency and variation. A seed defines the
and variation. A seed defines the starting noise used to generate an image. You can think of it as the
image. You can think of it as the initial random pattern the model starts from. When diffusion begins, the model
from. When diffusion begins, the model always starts from noise. The seed
decides exactly what that noise looks like. If you use the same prompt, the
like. If you use the same prompt, the same settings, and the same seed, you will get the same image every time. If
you change the seed, you change the starting noise, and the final image will be different. The prompt guides the
be different. The prompt guides the process, but the seed decides the starting point. Different starting noise
starting point. Different starting noise leads to different results, even when everything else stays the same. You can
think of the seed like rolling a dice before starting. If you roll the same
before starting. If you roll the same number, you start from the same situation. If you roll a different
situation. If you roll a different number, the outcome changes. This is a simplified explanation. The seed
simplified explanation. The seed controls a random number generator used internally by the model. You do not need to understand the math behind it. You
only need to know that seeds control repeatability. Let us put it into
repeatability. Let us put it into practice. The seed is this number here.
practice. The seed is this number here.
It can start from zero and go up to a very large number. So each seed can produce a slightly different result. If
you also change the prompt and settings, you can get millions of different results. We can control the seed
results. We can control the seed behavior. If we set it to fixed, we
behavior. If we set it to fixed, we generate once and the result will never change. To generate something new, we
change. To generate something new, we need to change other settings. If we
choose increment, after each generation, the seed number will increase by one. If
we choose decrement after each generation the seed number will decrease by one. So let us change it to fixed and
by one. So let us change it to fixed and set the seed to 10. When I generate I get this robot. Now let us change the seed to 15. You can see that I get a
different robot this time in profile. If
I change the seed back to 10, I get the previous robot again because we used the same prompt, the same settings, and the same seed.
In prompts, the order of the words matters.
With this prompt, I got this image because house was first. So, the model focused on the house and mostly ignored the car. With newer models, this happens
the car. With newer models, this happens less often. But this is an older model,
less often. But this is an older model, so the effect is more noticeable. Now,
look at what happens if I put car first and then house. This time, we clearly get both a car and a house. Words that
appear earlier in the prompt usually have more influence than words that come later. You can think of the prompt as a
later. You can think of the prompt as a list of priorities. The model pays more attention to the beginning and gradually less attention as it moves toward the
end. On top of that, some words can
end. On top of that, some words can carry more weight, either because of how the model was trained or because we explicitly give them extra emphasis.
Because of this, two prompts with the same words but in a different order can produce noticeably different results.
Think of the prompt like giving directions to someone. If you say a red cat sitting on a chair in a room with soft lighting, the most important idea
is red cat. Everything after that adds detail, but the core idea comes first.
We can also add more weight to a word by using round brackets. Right now, house has more weight. So the model pushes the car into the background and it is no
longer the main focus. If I add even more brackets, the influence of house becomes even stronger and now the car disappears completely. If I instead add
disappears completely. If I instead add more weight to the word blue, you will see more blue appear in the generation.
One more thing you might notice is that there is no spell check by default.
Sometimes it can be useful to turn it on. To do that, go to settings,
on. To do that, go to settings, search for spell, and enable text area widget spell check. Now, words that are misspelled or not part of the dictionary
will be underlined.
Now, we talk about denoising steps.
Steps control how many refinement passes the model performs during generation.
Each step removes a small amount of noise. The image is not created in one
noise. The image is not created in one action. It is refined little by little,
action. It is refined little by little, step by step. When you increase the number of steps, the model has more chances to clean up noise and add detail. When you decrease the number of
detail. When you decrease the number of steps, the process is faster, but the image can look rough or incomplete. More
steps means slower generation and more refinement. Fewer steps means faster
refinement. Fewer steps means faster generation and less refinement. There is
always a balance between speed and quality. You can think of steps like
quality. You can think of steps like polishing an object. More polishing
gives a smoother result. Less polishing
is faster but rougher. In Comfy UI, steps are set inside the K sampler node.
For most models, a good starting range is between 20 and 30 steps. Going much
higher often gives diminishing returns.
Going much lower is useful for fast previews. Steps work together with the
previews. Steps work together with the seed and the prompt. The seed decides the starting noise. The prompt guides the direction. Steps decide how far the
the direction. Steps decide how far the refinement goes. Now we are ready to
refinement goes. Now we are ready to look at a real workflow in Comfy UI.
This is called text to image. Often
shortened to text to img. Text to image means we start from pure noise and generate an image only from text instructions. There is no input image
instructions. There is no input image involved. This is usually the first
involved. This is usually the first workflow people learn and it is the best way to explore ideas and styles from scratch. We start by loading a model.
scratch. We start by loading a model.
This model contains everything the AI learned during training. Next, we give the model instructions using a text prompt. This describes what we want to
prompt. This describes what we want to see in the image. We also define the image size using an empty latent image.
This decides the resolution before the image is generated. Then the K sampler runs the diffusion process. This is
where noise is removed step by step guided by the prompt. After that, the VAE decodes the latent result into a visible image. Finally, the image is
visible image. Finally, the image is saved to disk. Use text to image when you want to explore new ideas, you want to test prompts and styles, or you are
starting from nothing. This workflow is ideal for concept art and experimentation. But we can also start
experimentation. But we can also start from an image not just from pure noise.
In that case instead of beginning with random noise we use an existing image as the starting point and apply d noiseise on top of it. You can think of den noiseise as how much freedom the model
has to change the image. With low den noiseise the model stays very close to the original image. With higher d noiseise it moves further away and behaves more like text to image. So
rather than generating everything from scratch, we are guiding the diffusion process using an image as the base and then controlling how much it changes using the D noiseis value. Image to
image is like starting with a rough sketch and deciding how much you want to redraw it. You can see that in the text
redraw it. You can see that in the text to image workflow we have empty latent image. That node generates the noise. In
image. That node generates the noise. In
this workflow, we have an image that is encoded to latent so it can go to the K sampler. Let me show you how I did it. I
sampler. Let me show you how I did it. I
removed the empty latent image node.
Then I doubleclicked on the canvas and added a load image node. From here we can load an image and I will choose this robot. Now you can see it does not have
robot. Now you can see it does not have a latent output. So we cannot connect it to the K sampler yet. So we need a VAE.
If we look we have decode and encode. We
already have VAE decode when it converts from latent to pixels. Now we want to encode it. An easy way to find the right
encode it. An easy way to find the right node is to drag a link and release it.
And you will see a suggestion for VAE in code. Now we have a latent output which
code. Now we have a latent output which means we can connect it to the K sampler which is what we want. If we try to run it like this, something is missing. It
says missing VA AE. You can see a big red outline around the node with the problem and a small circle around the input which means we need a connection
there. So let us connect it to the VAE.
there. So let us connect it to the VAE.
In this case, the VAE is included in the main model. So we connect it from there.
main model. So we connect it from there.
Now we encoded it and then we decode it.
Let us run again. And now it works. But
the result is still different from my input. We have the right prompt, but
input. We have the right prompt, but something is influencing it. Remember
this. Every time you use an image as input, we need to adjust the D noiseis because that controls how much the image changes. With the default value of one,
changes. With the default value of one, it is at the maximum. So, it changes the image too much. Let us change it to 0.2 and see how that affects it. Now, you
can see it is very similar to the original. It is hard to tell what parts
original. It is hard to tell what parts changed. Let us increase it to 0.5.
changed. Let us increase it to 0.5.
Now, we can see more changes in the robot face. There is an easy way to
robot face. There is an easy way to compare these images. Double click on the canvas and search for image compar.
This is part of the RG3 node pack. You
can see it has two inputs, image A and image B. I want to compare the original
image B. I want to compare the original image. So I will connect the load image
image. So I will connect the load image output to image A. For the second image, remember the save image node is only for saving to disk. The image we want to
compare is the one coming out of VAE decode. So we connect that to image B.
decode. So we connect that to image B.
Now let us run the workflow. We get this small preview. Let me make it larger. It
small preview. Let me make it larger. It
is still too small. So I will move some nodes to make space so you can see it better. By default, it shows image A,
better. By default, it shows image A, the original. When we move the mouse
the original. When we move the mouse over to the right, it shows the second image. Now it is much easier to compare
image. Now it is much easier to compare before and after. If I change the noise to 0.1, we get a very similar result because the amount of den noise is
small. If I change it to 0.9,
small. If I change it to 0.9, we get a big variation. All of this is also influenced by the sampler, the scheduleuler, and the model itself. But
in general, this is how it works. I
prefer to start with values around 0.7.
If that is too much, I reduce to 0.5 and keep adjusting until I like the result. Another thing you should know is
result. Another thing you should know is that the input image size influences the result size. Since we do not have an
result size. Since we do not have an empty latent image node where we set width and height, the loaded image decides the size. Comfy UI will also
round the size to a multiple of 8. For
example, if your image is 511 pixels, it can be rounded down to the next number that is a multiple of 8, like 504.
You can also control the input size by resizing or cropping it, like you saw in the earlier chapters. For example, I can add an upscale image node here, then
redo the connections so the image passes through it. I can upscale to a bigger
through it. I can upscale to a bigger size with the same ratio. Now when I run it, the final image should be larger because it follows the input image size.
Now we are going to talk about samplers and schedulers which you can find here in the K sampler. This is one of the most confusing parts at first, but the
idea is actually simple. Everything
begins with the same initial noise. The
seed defines that noise, but once the noise exists, two different systems control what happens next. The sampler
decides how noise is removed. It defines
the strategy the model uses to go from noisy to clean. Different samplers use different mathematical paths to dn noiseise. Some remove noise more
noiseise. Some remove noise more directly. Some refine the image
directly. Some refine the image gradually. Some are more random and
gradually. Some are more random and creative. Some are more stable and
creative. Some are more stable and precise. Even with the same prompt, the
precise. Even with the same prompt, the same seed, and the same number of steps, changing the sampler can change the final image. So the key idea is this.
final image. So the key idea is this.
Sampler controls how each denoising step is calculated. Or in simple terms,
is calculated. Or in simple terms, sampler equals how noise is removed.
Theuler does not change how denoising works. It changes when denoising happens
works. It changes when denoising happens during the steps. A linearuler spreads denoising evenly across all steps. Each
step removes roughly the same amount of noise. A nonlinearuler removes noise
noise. A nonlinearuler removes noise faster at the beginning and slower near the end. This allows fast structure
the end. This allows fast structure early and fine detail later. Both
approaches can reach a clean image, but they feel different in how detail is introduced. So the key idea here is
introduced. So the key idea here is controls when noise is removed or simply equals when noise is removed. Sampler
anduler always work together. You never
choose one without the other. The
sampler chooses the denoising method.
The scheduler chooses the timing of that denoising. The same noise plus a
denoising. The same noise plus a different sampler or a differentuler can produce different results.
Let us do a little experiment in comfy UI. From workflows, I open again this
UI. From workflows, I open again this text to image workflow and I change the seed to fixed. Then I run the workflow.
With this sampler anduler, we get this robot. Here we have a lot of samplers
robot. Here we have a lot of samplers and schedulers. Depending on the model
and schedulers. Depending on the model we use, some work better than others.
Let us say I pick the Oiler sampler. Now
when I run it, even if the seed and prompt are the same, the result is slightly different because the sampler influences how the den noiseise is applied. Let us say I also change
applied. Let us say I also change theuler to simple. Now the result will again be different because theuler changes when the den noiseise happens
during the steps because the model we use is quite small. We can actually preview multiple results at the same time. So I hold the control key and drag
time. So I hold the control key and drag over these three nodes. Then I use controll + c to copy them and controll + shift + v to paste them with the links
connected. Now this workflow will
connected. Now this workflow will generate two images and has two k sampler nodes. Let me use controll +
sampler nodes. Let me use controll + shift +v again to get a third one. Now
this workflow uses the same seed and prompt with three different k sampler nodes. And I want to change the samplers
nodes. And I want to change the samplers andulers for each one. You can play with these all day and try many combinations.
I will choose something random for this example.
Now when I run it, you can see it generates an image for each sampler.
Some results are quite similar, but some details are different. For example,
parts of the robot may change from one image to another. Let me now put the same sampler on all of them and use different schedulers only so we can see how the timing of den noiseise
influences the result.
Again, the differences are subtle but they are there. Sometimes this can mean one image has five fingers and another has six. So having options is useful
has six. So having options is useful especially when you want small variations. Now let us double click on
variations. Now let us double click on the canvas and add a primitive node. I
want to control the steps value for all three K samplers, but I do not want to change it manually on each one. So I
drag a connection from the primitive node to the steps input of the first K sampler. Then do the same for the second
sampler. Then do the same for the second and the third one. Now from this single node I can control all three. If I
change steps to one, you can see we get very similar results. If I change steps to three, you can already see differences. Someulers are faster. For
differences. Someulers are faster. For
example, with one, the image is still very noisy, while with another, you can already see a shape forming. If I change to four steps, the differences become
more visible. At five steps, some start
more visible. At five steps, some start to form clearer shapes. At six steps, some images already show eyes and a main structure. At eight steps, the middle
structure. At eight steps, the middle one is almost fully formed. At 10 steps, almost all of them have something that could work for certain concepts. And at
20 steps, most of them have enough detail to be usable in a project.
Usually, the people who create AI models suggest specific samplers and schedulers, or the community tests them and shares which ones work best. This
way, you do not have to test everything yourself for every model. But if you do find good settings, it is always a good idea to share them with the community so everyone can improve their image
generation results. Let's talk a little
generation results. Let's talk a little about subgraphs in Comfy UI. Go to
workflows and open the juggernaut text to image workflow. Here you can see a bunch of nodes. Just like before, hold the control key and drag to select most
of the nodes except the export node, which in this case is the save image node. Now that the nodes are selected,
node. Now that the nodes are selected, look at the icons at the top. One of
them says convert selection to subgraph.
When you hover over it, when you click it, all those selected nodes are combined into a single node. If you
write click on this new node, you will see an option called unpack subgraph.
When you click it, the nodes go back to how they were before. Let's do it again.
Select two or more nodes, then use the subgraph button to create a subgraph.
Resize it so it is easier to see. A
subgraph is a way to group multiple nodes into a single reusable block.
Instead of showing a long chain of nodes every time, you collapse them into one node that represents an entire process.
You can think of a subgraph like a function or a macro. Inside it there can be many nodes, but from the outside it looks simple. It is very similar to
looks simple. It is very similar to smart objects in Photoshop which can contain multiple layers inside a single object. Subgraphs solve three main
object. Subgraphs solve three main problems. First, they reduce visual clutter. Large workflows can become
clutter. Large workflows can become messy very quickly and subgraphs help keep things readable. Second, they help reuse logic. If you repeat the same
reuse logic. If you repeat the same setup many times, like a prompt encoding chain or an image pre-processing step, you can reuse it instead of rebuilding
it every time. Third, they make workflows easier to explain and share.
People understand a few clean blocks much faster than dozens of individual nodes. At the time I recorded this
nodes. At the time I recorded this tutorial, subgraphs were still being improved and may still have some bugs. A
subgraph does not make a workflow faster by itself. It is about organization, not
by itself. It is about organization, not performance. Performance depends on the
performance. Performance depends on the nodes inside the subgraph, not on the subgraph wrapper. A subgraph is like
subgraph wrapper. A subgraph is like putting many Lego pieces into one box and labeling the box with what it does.
All the pieces are still there. You just
do not need to see them all the time.
You can see the title says new subgraph.
Let's double click on that and rename it to something that makes sense like text to image and maybe also include the model name. So, juggernaut text to
model name. So, juggernaut text to image. Now, it looks like a simple
image. Now, it looks like a simple workflow with only two nodes. I do not like the order in which things appear in the node. So, let's write click on the
the node. So, let's write click on the node and select edit subgraph widgets.
Here you can choose what parameters to show in that node and what to hide. Let
me hide all of them so we have a clean subgraph that does not show any parameters. You can enable them one by
parameters. You can enable them one by one later if you want only the ones you need. But we will build those manually
need. But we will build those manually so you understand them better. Let's
close this panel. Now let's go inside the subgraph. You can see that all the
the subgraph. You can see that all the nodes are there plus some input and output. On top you can see a new tab
output. On top you can see a new tab next to the workflow name. If I click on the main workflow name that is how we exit the subgraph. From there we can go
back inside and from inside we can go back outside. You can see the output
back outside. You can see the output where the image is saved. If we go inside the subgraph that image output appears here as a link. From this dot we
can drag a connection to where it says checkpoint name. Now that field becomes
checkpoint name. Now that field becomes gray just like when we added a primitive node before. If we go back outside you
node before. If we go back outside you can see that the checkpoint name appears here. Let's go back inside again, double
here. Let's go back inside again, double click on that name, and rename it to model to see what happens. Now, when we go back to the main workflow, you can
see it says model instead of checkpoint name. So, this is very customizable.
name. So, this is very customizable.
Let's go back inside and drag another connection, this time to the positive prompt and rename it so we know what it is. Do the same for the negative prompt.
is. Do the same for the negative prompt.
Now, when we go back outside, we have positive and negative prompt visible. Go
back inside again and drag connections to width and height. And maybe do the same for all the parameters from the K sampler. Now we can see all those
sampler. Now we can see all those parameters exposed here. And when we go back outside, we have this single node that acts like a mini interface that can
control everything we need. You might
say that it looks nice, but does it actually work? Let's try it. And the
actually work? Let's try it. And the
answer is yes, it works. If you right click on it, you can see it still has other options like node color, bypass
and so on. With that subgraph selected, right click on the canvas this time and you will see an option called save selected as template. It asks for a
name. I will name it juggernaut text to
name. I will name it juggernaut text to image. Then press enter or confirm. It
image. Then press enter or confirm. It
looks like nothing happened, but where was that template saved? Let's open a new workflow. Now right click on the
new workflow. Now right click on the canvas and go to node templates. You can
see that name there now. And you also have the option to manage templates and remove them. When I select that
remove them. When I select that template, it is added to the canvas with all the nodes, connections and settings it has inside. Now we can just drag a
link from the image output and add a save image node or connect it to other nodes to create more complex workflows.
Over time, this simplifies workflows because we can organize them into pieces and group them by category or function.
Let's go back to the first workflow just to show you that any node or combination of nodes can be saved as templates.
There are cases where some connections can break when some nodes are inside a subgraph and others are outside. So,
keep that in mind. For example, I use this Pixaroma note node a lot. I want to save it as a template so I can access it easily next time. This might not be
useful for everyone, but as a workflow and tutorial creator, I use this a lot.
I will save it as a template and give it a name. Now I can go to any other
a name. Now I can go to any other workflow and quickly access that template from anywhere. You can also have subgraphs inside other subgraphs like boxes inside boxes.
You can disconnect or remove links at any time. I could select two nodes here
any time. I could select two nodes here and combine them into another subgraph or go outside and combine all these nodes even if some are simple nodes and
one is already a subgraph and it will still let me create a new subgraph. If
we go inside all those nodes are there.
If we go back outside we can unpack it using the icon or rightclick and choose unpack subgraph. These things will make
unpack subgraph. These things will make more sense as you work with them in practice. So play with them and have
practice. So play with them and have fun. When you see that icon on a node,
fun. When you see that icon on a node, you know it is a subgraph. It also has the icon that lets you go inside the subgraph, which is another indicator that it is not a simple node. Remember
that you can also use the interface to edit subgraph widgets. One thing I forgot to show is that you can use those dots to rearrange the order of the parameters shown in the subgraph node.
This way you do not need to go inside it. Most of the time you can control
it. Most of the time you can control things directly from the outside. Now we
are going to talk about loris. Laura
stands for low rank adaptation. In
simple terms, Allora is a small add-on that modifies how a base model behaves.
Allora does not replace the model. It
works together with the model. You can
think of the base model as the main photographer we hired earlier. Allora is
like giving that photographer extra experience in a specific style or subject. Why loris exist? Training a
subject. Why loris exist? Training a
full model is very expensive. It
requires a lot of images, time, and powerful hardware. Loris exists to solve
powerful hardware. Loris exists to solve this problem. Instead of retraining a
this problem. Instead of retraining a full model, we train a small adapter that teaches the model something new.
This could be a specific art style, a character, a face, a pose style, or a lighting style. Loris are much smaller
lighting style. Loris are much smaller than full models. That is why they are easy to download and experiment with. So
remember, a Laura does not work by itself. It always needs a base model and
itself. It always needs a base model and a compatible architecture. For example,
a stable diffusion 1.5 Laura needs a stable diffusion 1.5 model. An SDXL
Laura needs an SDXL model and so on. If
you mix incompatible models and loris, the results will be broken or random.
Let's open comfy UI to test it because what is theory without practice, right?
Open workflow 3, the one that has Laura in the name. As you can see, the workflow is very similar to what we had before. That is one of the reasons I am
before. That is one of the reasons I am using this older model instead of a newer one. It is easier to learn the
newer one. It is easier to learn the basics first and then we can make things more complex as we move forward.
Compared to the first text to image workflow we used earlier, we now have this Laura loader node that loads a Laura model in our photographer analogy.
This means the photographer took some classes on how to take photos of cakes and is now specialized in that subject.
Let's look at the note node first. We
need to download the Laura model.
Remember the workflow comes with nodes and settings, but since it is just a text file, it cannot include the actual models. We have to download those
models. We have to download those separately and place them in the correct folder. In this case, we are using a
folder. In this case, we are using a Laura called cake style. It is a small model trained on images of cakes. So, it
understands cakes better than the base model alone. A few years ago, when
model alone. A few years ago, when stable diffusion 1.5 models first appeared, they could not handle many subjects very well, and luris were often
used to fix those limitations. So, we
need to download this Laura and place it inside the Laura's folder. Click where
it says here. Then we need to place that file in the Laura's folder. Go to your Comfy UI folder. Open the models folder and then find the Laura's folder. If we
place it directly here, it will work perfectly. But this time, I want to keep
perfectly. But this time, I want to keep things organized. I want to create a
things organized. I want to create a folder that tells me which base model this Laura is compatible with. So, I
will create a folder called SD15.
This way I know it works with that model and I do not mix it with others. Save
the Laura inside that folder. If you
look at the file now, you can see that the Laura is much smaller than the base model. All lures should go into this
model. All lures should go into this folder and it is best to organize them by base model name like SDXL, Flux, Quinn, and so on just like we did with
checkpoints in an earlier chapter. Now
go back to Comfy UI. We have everything we need to run this workflow, but because Comfy UI was already open when we downloaded the model, it cannot see it yet. We need to press the R key to
it yet. We need to press the R key to refresh the node definitions.
Now it appears in the list and we can select it. You can also see a note here
select it. You can also see a note here with a trigger word. I like to add these notes so I remember them. Many Lauras
are trained using specific trigger words. These are words the Laura learned
words. These are words the Laura learned during training. If you do not include
during training. If you do not include the trigger word in the prompt, the lura may have little or no effect. Some luras
work without trigger words, but many require them. Always read the Laura
require them. Always read the Laura description from the place where you downloaded it. If we look at the
downloaded it. If we look at the positive prompt, we first added the trigger words so we do not forget them.
It is not required to be first. It can
also be placed after a few words, but I like to put it first. Then we have the prompt for a robot. I did it this way so we can clearly see how the Laura and a simple trigger word affect the result.
Now if we run this workflow, we get this robot cake. You might think this model
robot cake. You might think this model could do that without a Laura, but it depends on the prompt and the model. Let
me change the seed to fixed so we can get a consistent result. So this is how it looks with the Laura applied. Now
what I want to do is see the effect the workflow without Laura and without changing anything else. Same prompt,
same settings, same seed, just disable the Laura. To do that, I right click on
the Laura. To do that, I right click on this node and choose bypass. Now, when I run the workflow, the Laura is bypassed.
And you can see we get a normal robot instead of a cake robot. If I enable the node again and run it, you can clearly see the effect the Laura has on the image. Now that you see how it works,
image. Now that you see how it works, let's adapt a normal text to image workflow and add the Laura ourselves for practice. Open workflow 1, the basic
practice. Open workflow 1, the basic text to image workflow. Now I want to add the Laura between the model and the K sampler. Double click on the canvas,
K sampler. Double click on the canvas, search for Laura, and add the node called Laura loader model only. Let me
resize it so the text is easier to see.
I also like to color these nodes blue so I can spot them faster in big workflows, but that is optional. Now, we need the model connection to go through this node. If you look now, the model is
node. If you look now, the model is connected directly to the K sampler, but we want the extra knowledge from the Laura. Drag a connection from the model
Laura. Drag a connection from the model output to the Laura loader and then from the Laura loader to the K sampler. The
workflow is now complete. Let's set the seed to fixed so we can clearly see how different settings affect the result. It
runs without errors, so everything is connected correctly. Even if Allora
connected correctly. Even if Allora sometimes works without a trigger word, it is best to include it when one is provided. So let's add the trigger word
provided. So let's add the trigger word cake style to the positive prompt. Now
when I run it, we get a different result even though the seed is fixed. That
shows the Laura is doing its job. If I
change the seed, we get another variation. To avoid forgetting trigger
variation. To avoid forgetting trigger words, I like to add a note node. I
write the trigger word there. change the
note title so it is clear what it is for and often change the color to match the Laura nodes so I know they are related.
One important thing I have not mentioned yet is that you can use multiple loris.
If I want I can clone this node by holding alt and dragging then connect them one after another. You can stack several luras this way. I personally do not use more than three or four at once.
In this setup, the base model is combined with the first Laura, then the second Laura, and all that information goes into the K sampler. In the prompt, you add the trigger words for all the
Loras you use. If I run this now, some strange things can happen. First, I use the same Laura twice, which makes its effect too strong. Second, when using
multiple lores, it is usually a good idea to reduce their strength so they blend better instead of overpowering the image. If I lower the strength values,
image. If I lower the strength values, the result becomes much more stable and usable. If your result looks too weird,
usable. If your result looks too weird, one of the first things to try is reducing the Laura strength. Let me
delete the extra Laura and keep only one, then set its strength to one. Each
Laura has a strength value. This
controls how strongly the Laura affects the model. Low values give subtle
the model. Low values give subtle influence. High values give strong
influence. High values give strong influence. If the value is too high,
influence. If the value is too high, images can break, faces can deform, and styles can become unstable. A good
starting range is usually between 0.6 and 1.0. There is no universal best
and 1.0. There is no universal best value. Each Laura behaves differently.
value. Each Laura behaves differently.
Let's delete this Laura loader node. And
let me show you another node you can use, this time from a custom node.
Search for power Laura loader. This one
comes from the RG3 node pack. What is
different compared to the previous one is that it has two inputs and two outputs. Because of that, we need to
outputs. Because of that, we need to route both the model and the clip through this node. First, connect the model output to the power Laura loader
model input. Then connect the clip
model input. Then connect the clip output to the clip input on this node.
After that, the clip output from the power lura loader goes to both the positive and negative prompt nodes. Let
me select all these nodes and move them a bit so you can see more clearly how both the model and the clip go through the Laura loader. Now we can add the Laura we want directly inside this node.
We can also add multiple Lauras here.
You can see that I can add a second one and even a third one. If I right click on a Laura entry, I can remove it. I can
do the same for any of them. If we right click on a Laura and choose show info, we get more details. There is also a button called fetch from civitai. Civet
AAI is a website that hosts models and you will see it later. If the Laura is public and available on Civai, this will fetch useful information about it,
including examples and trigger words. We
also have toggle buttons here. We can
toggle all Loris on or off or toggle them individually. Let me add another
them individually. Let me add another Laura so you can see how that works.
This way we can load multiple loras but enable only the ones we want at any moment. After playing a bit with this
moment. After playing a bit with this Laura, I found that a strength value around 0.55 to 0.6 works better for this specific one. I tried it with the
specific one. I tried it with the trigger word, then added an orange golden fish, cute and adorable, and got this result. It is not bad for such a
this result. It is not bad for such a small model. For a second example, I
small model. For a second example, I tried a marzipan cake shaped like a woman and got this result.
For a third one, I tested a marzipan castle. Again, this is just for
castle. Again, this is just for practice. Later, you will see better
practice. Later, you will see better models that can produce much higher quality images with fewer errors. Loris
are lightweight. They do not increase VRAMm usage very much. Common beginner
mistakes are using the wrong base model, using strength values that are too high, forgetting trigger words, and expecting a Laura to fully replace a model. Loris
enhance models. They do not replace them. Stacking many Loras can slow
them. Stacking many Loras can slow things down slightly, but not dramatically. For beginners, it is best
dramatically. For beginners, it is best to start with one Lura at a time. The
base model is the photographer. The
Laura is a specialty training course that photographer took. The photographer
still uses the same camera. They just
learned a new style.
Now that you understand diffusion, prompts, imageto image, loris, and workflows, we are ready to talk about controlNet. ControlNet is one of the
controlNet. ControlNet is one of the most powerful features you can use in Comfy UI. In simple terms, controlNet
Comfy UI. In simple terms, controlNet lets you guide image generation using an extra image, not just text. Instead of
saying what you want only with words, you can also show the model what you want. What control net is? Control net
want. What control net is? Control net
is an additional neural network that works alongside the main diffusion model. It does not replace the model. It
model. It does not replace the model. It
does not replace the prompt. It adds
extra control. The base model still does the image generation. The prompt still guides the style and subject. Control
net adds structure and constraints. You
can think of it like this. The prompt
says what the image should look like.
The seed decides the starting noise. The
sampler and scheduler decide how noise is removed. Control net tells the model
is removed. Control net tells the model where things should go. Why control net exists? Text prompts are powerful, but
exists? Text prompts are powerful, but they are also vague. If you say a person standing, the model decides the pose. If
you say a city street, the model decides the layout. If you say a face, the model
the layout. If you say a face, the model decides the proportions. Control net
exists for cases where you want more control. For example, you want a
control. For example, you want a specific pose. You want a specific
specific pose. You want a specific composition. You want to follow a
composition. You want to follow a sketch. You want to preserve the
sketch. You want to preserve the structure of an input image. Control net
makes results more predictable and repeatable. This is a simplified
repeatable. This is a simplified explanation. In reality, control net
explanation. In reality, control net works by injecting additional conditioning into the diffusion process at every denoising step. But you do not
need to understand the math for learning and practical use. This mental model is enough. Control net guides structure
enough. Control net guides structure while diffusion fills in details. Let's
open comfy UI. Go to workflows and select workflow number four. The one
that has control net in the name. The
workflow is similar to the text to image workflow. It is still a textto image
workflow. It is still a textto image workflow but it is guided by an image using controln net. You can quickly tell it is text to image because of the empty latent image node. I highlighted in
yellow the nodes that we usually use for control net. Let's go to the note node
control net. Let's go to the note node to see what we need. The checkpoint
model was already downloaded earlier. We
also need to download some control net models for custom nodes. We need this specific custom node which comes with the easy install version. But if you are
using a different Comfy UI version, you need to install this node first. We have
a Cany model, another one called depth, and another one called open pose. There
are more types available, but these are the most popular and commonly used ones.
Let's download all three so we can test them. First, download the Cany model.
them. First, download the Cany model.
Then, go to your Comfy UI folder. Open
the models folder and think about where this model should go. If you guessed the control net folder, you are right. We
place it there because different base models can have different control net models. Just like with Loris, a control
models. Just like with Loris, a control net is only compatible with the base model it was trained for. So, let's
organize them properly and create a folder so we know which base model these control nets are compatible with. Save
the model in that folder. Next, download
the depth model and save it in the same folder.
Then download the open pose model and save it in the same folder as well. Wait
for all downloads to finish. If Comfy UI was open while downloading, we need to refresh it. Press the R key to refresh
refresh it. Press the R key to refresh the node definitions. Now we have everything we need to run this workflow.
In the workflow, we have an apply control net node with some settings. We
also have a node that loads the control net model we downloaded and a pre-processor node that converts the input image into a format that control net understands and was trained on.
Let's run the workflow. In this example, we are using the canny model. We loaded
a bunny sketch and with the help of the pre-processor, it generates a canny map which is an image that detects the edges of the input image. With the prompt, we influence what we want to generate. And
with apply control net, the model interprets that canny map and uses it to guide the generation to get this image.
You can imagine that without control net, it would be very hard to get something this complex using only a prompt, especially with the small model we are using today. Now, let's build this workflow ourselves so you
understand it better. Open again the first workflow that you already know how to build and we will adapt it to use controlNet.
We know control net comes before the K sampler. So let's move some nodes to
sampler. So let's move some nodes to make room for it. Double click on the canvas and search for apply control net.
Add the node and change its color to yellow so it is easy to recognize. Now
let's connect the parts that are obvious first. Positive goes to positive,
first. Positive goes to positive, negative goes to negative. For the
outputs, there is only one place where they make sense. So we connect those as well. At this point, the node still has
well. At this point, the node still has missing inputs. One of them is the VAE.
missing inputs. One of them is the VAE.
We already know where the VAE comes from. In our case, it is included in the
from. In our case, it is included in the checkpoint model. The same VAE we
checkpoint model. The same VAE we already used for encode and decode. The
next missing input is the control net model itself. Double click on the
model itself. Double click on the canvas, search for load control net, add the node, and color it yellow as well.
Now connect it to the apply control net node.
The last missing input is the image. Add
a load image node.
Then we can select an image. In this
case, I will use a bunny sketch. You
might be tempted to connect this image directly to control net. But that
usually does not work. Control net
expects a very specific type of image because it was trained on that type of data. Our sketch is just a normal image.
data. Our sketch is just a normal image.
So, we need a pre-processor to convert it into something ControlNet understands. Double click on the canvas
understands. Double click on the canvas and search for AIO, which stands for all-in-one. Add the prep-processor node
all-in-one. Add the prep-processor node and color it yellow. Connect the load image node to the prep-processor. Then
connect the prep-processor to the apply control net node. Right now, the prep-processor is set to none, so we need to choose one. Since we plan to use a Cany control net model, select a Canny
edge prep-processor.
To better understand what is happening, add a preview image node after the pre-processor. This allows us to see the
pre-processor. This allows us to see the control image that is actually being sent to control net. When we run the workflow, we get a cany map, white edges on a black background. This shows
exactly what ControlNet will use as guidance. You can also adjust the
guidance. You can also adjust the resolution here if you want more detail in the map. Now, let's look at the result. It does not look very good yet.
result. It does not look very good yet.
This happens often when working with controlNet. And there are a few things
controlNet. And there are a few things to check. First, look at the prompt. We
to check. First, look at the prompt. We
are still using a robot prompt, but the image is a bunny. So, let's change the prompt to something like a watercolor painting of a bunny. Next, reduce the control net strength and the end%
slightly. Run again. The result is a bit
slightly. Run again. The result is a bit better, but it still does not follow the sketch very well. If changing the seed does not help, the next thing to check
is the control net model itself. If you
look at the load control net node, you may notice that the selected model is not a canny model, but a depth model.
That is the problem. The pre-processor
and the control net model must match.
Select the correct CANY control net model. Now run the workflow again. The
model. Now run the workflow again. The
result is much better and follows the sketch closely. Let's try another
sketch closely. Let's try another example. Load a 3D text image. Since
example. Load a 3D text image. Since
this image has depth information, we can try a depth control net instead. Change
the control net model to depth and update the prompt to something like golden text in snow. When you run it, you may notice the preview still looks like a cany map. That means we forgot to
change the pre-processor. Switch the
pre-processor to a depth prep-processor.
The first time you run a new prep-processor, Comfy UI may take longer because it downloads a small model automatically. This only happens once.
automatically. This only happens once.
If you get a long path error on Windows, close Comfy UI and run the long path enabler from the tools folder. Now we
see a depth map. Dark areas represent parts that are farther away. Lighter
areas represent parts that are closer.
ControlNet uses this information to understand spatial structure. The
generated result now follows the depth and composition of the original image very closely. If for some reason you get
very closely. If for some reason you get an error saying the model is incomplete or something similar, you can close Comfy UI, go to the tools folder and run
the batch file called long path enabler.
This should fix the long path issue and allow Comfy UI to download the model it needs even when the file path is longer.
You can also try the same image with a Cany control net. Switch both the model and the pre-processor back to cany and run again. Even if some edges are
run again. Even if some edges are missing, it can still guide the generation. Well, try switching back to
generation. Well, try switching back to depth and compare results. Often, one
will work better than the other depending on the image. Now, let's talk about the key control net parameters.
Control net does not replace diffusion.
It only guides it during certain parts of dnoising. Strength controls how
of dnoising. Strength controls how strongly control net influences the image. Low values make control net a
image. Low values make control net a soft suggestion and the model can drift away. High values strongly enforce
away. High values strongly enforce structure and make the output closely follow the control image. Typical values
are between 0.5 and 0.7 for natural results and 0.8 to 1 for strict structure matching. Start percent
structure matching. Start percent controls when control net begins influencing the dnoising process. A
value of zero means control net starts from the very first step, locking structure early. Higher values allow the
structure early. Higher values allow the model to form rough shapes first before control net takes over. End% controls
when control net stops influencing dnoising. A value of one means control
dnoising. A value of one means control net stays active until the end, locking structure even in fine details. Lower
values allow control net to stop earlier, letting the model finish on its own and add more style. In simple terms, strength is how hard control net pulls.
Start percent is when it starts pulling and end percent is when it lets go. That
is why control net is so powerful. You
can guide structure without killing creativity.
It is time to test the pose control net as well. But first, let us add a pose
as well. But first, let us add a pose image as reference. Let us say I add this woman. Again, it is kind of hard to
this woman. Again, it is kind of hard to put into words the exact same pose. So
this is a good use case for control net.
Let us change the prompt to something else. Maybe woman in a sumo yoga pose.
else. Maybe woman in a sumo yoga pose.
Not sure how to call it. Then do not forget to change the model to open pose.
I know it might seem complex at first, but newer models have union control net models that include everything in one.
So you only load one model, which makes things easier. That said, this is how we
things easier. That said, this is how we used to do it. And there are still cases where we need specific models. Then we
can try either DW pose or open pose.
Type pose and select this open pose. And
then let us run it again. It will take a bit since this is the first time I use it. In fact, if you look at the command
it. In fact, if you look at the command window, you can see it is downloading that model from hugging face. That is
why it takes so long. After it finishes, it gives this pose image that looks like a skeleton with each color representing a bone. That is how it knows which side
a bone. That is how it knows which side is right and left and so on. So it
captured the pose and now let us see the result. Holy sumo, what is this? Okay,
result. Holy sumo, what is this? Okay,
let us adjust the prompt. Maybe a fit woman will help. That did not help much.
I bet the word sumo has too much weight.
Like we talked about before, some words have more power than others. So if I try without that word, I get a better result. Even the face is not so great.
result. Even the face is not so great.
And that can happen with people in the distance. Usually with portraits, we get
distance. Usually with portraits, we get better faces. Newer models fixed most of
better faces. Newer models fixed most of that. Let me try to change the
that. Let me try to change the resolution to see if something changes.
Now I get a better resolution for the skeleton. And the pose is okay. Just the
skeleton. And the pose is okay. Just the
face. That face does not invite me to do yoga. Okay. Let us try a different pose.
yoga. Okay. Let us try a different pose.
Something for a portrait. Let us say I use this portrait photo. Let us change the prompt to a businesswoman and run it to see what we get. The pose looks okay.
Even if it is missing an arm, it should still work. The results are much better
still work. The results are much better now that the face is closer. Let us try a warrior woman as well.
That works well too. So with control net, you have to continuously search for balance. Make sure you select the right
balance. Make sure you select the right control net model for the job. Then
choose a pre-processor that matches the model. As you saw, it is easy to forget
model. As you saw, it is easy to forget to change something. I usually play with strength and endstep. Also, do not
forget control net models made for SD1.5 only work with SD 1.5 base models. If
you use SDXL, you need SDXL control net models. In a later episode, we will
models. In a later episode, we will check some advanced models that do not even need controlNet and can do everything from prompts. Beginner
mistakes to avoid. Using the wrong control net model for the base model.
Forgetting to install or download controlNet models. Using very high
controlNet models. Using very high strength values. Expecting control net
strength values. Expecting control net to fix bad prompts. Using control net when it is not needed. Control net is a
tool not a magic fix. When you start using comfy UI you will notice there are many different model types. AIO models
FP16, FP8, GGUF, and others. This can be confusing at first, but the reason is actually simple. At their core, all
actually simple. At their core, all diffusion models are just very large collections of numbers. Those numbers
represent what the model has learned.
The knowledge itself does not change, but the way those numbers are stored can change. Different model formats exist to
change. Different model formats exist to balance memory usage, speed, and hardware compatibility. Some formats are
hardware compatibility. Some formats are larger but more precise. Others are
smaller and faster, but slightly less accurate. FP32 is the highest precision
accurate. FP32 is the highest precision and is mostly used for training. It uses
a lot of memory and is rarely used for image generation. FP16 is the most
image generation. FP16 is the most common format for stable diffusion. It
offers a very good balance between image quality and VRAMm usage. This is the safest and most recommended choice for most users. FP8 uses even less memory
most users. FP8 uses even less memory and can be faster on newer GPUs that support it. The trade-off is that it can
support it. The trade-off is that it can sometimes reduce image stability or detail slightly. AIO models stand for
detail slightly. AIO models stand for all-in-one. They bundle the main model
all-in-one. They bundle the main model VAE and sometimes clip into a single file. They are designed to be easy to
file. They are designed to be easy to use and reduce setup mistakes. The
downside is that they give you less flexibility if you want to swap components later. GGUF models come from
components later. GGUF models come from the language model world. GGUF stands
for GPT generated unified format. They
are optimized for very low memory usage and can run on CPU or low VR RAM systems. It is important to understand that these formats do not make the model
smarter or more creative. They do not change what the model knows. They only
change how efficiently that knowledge is stored and processed. You can think of it like the same video saved in different resolutions. The content is
different resolutions. The content is the same, but the file size and playback requirements are different. For most
users, FP16 models are the best starting point. AIO models are great for
point. AIO models are great for beginners. FP8 is useful if your GPU
beginners. FP8 is useful if your GPU supports it. GGUF is best when memory is
supports it. GGUF is best when memory is very limited. Once you understand this,
very limited. Once you understand this, choosing models becomes much easier. You
saw in the workflows that I include links to models, but you might wonder where I find those models, right? One of
the sites is hugging face, but it is not the most beginnerfriendly one. At the
top, we have a models tab. And here you can find a lot of models, but not all of them are diffusion models or used for generating images. Some are for video,
generating images. Some are for video, some for audio, some for large language models, and many are not compatible with Comfy UI. Some require different
Comfy UI. Some require different interfaces to run or they are so large that you cannot even run them on your computer. For example, I can sort them
computer. For example, I can sort them by text to image. And here you can see some popular ones like Quen, Zimage, or even the Flux model. If I click on one
of these, you will see that some models require you to sign in and accept certain terms. Each model has a license.
Some are open- source, some are free with conditions, and others are available only in certain countries. By
default, you are on the model card. This
is basically an info page about the model. On another tab, you have files
model. On another tab, you have files and versions where the files are usually available in different formats like in this example. And there are a few more
this example. And there are a few more files inside those folders. Let us go back to the homepage. Here you can also search for a model if you know the name
or browse popular ones like Z image.
Always check the tabs at the top to find more information about the models since they can be quite large and you want to make sure you can actually run them on your system. I know this is a lot of
your system. I know this is a lot of information, but as I said before, I usually include the model link directly in the workflow, so you do not have to stress about it. Still, it is good to
understand how models work and where they come from. Another site that is more beginnerfriendly and better organized is the Civot AI website. The
downside is that recently they removed access for some countries like the UK.
If you are from one of those countries, you will need a VPN to access it and download models. If you click on the
download models. If you click on the models tab, you can find all kinds of models for different interfaces like Comfy UI, Forge UI, and others. Most of
them are compatible with Comfy UI. On
the right side, you have filters. These
let you sort models by when they were added. You can also filter by model
added. You can also filter by model type. For example, checkpoints are the
type. For example, checkpoints are the main AI models. You can also filter by Laura or Control Net since we talked about those model types in previous
chapters. Of course, you can also filter
chapters. Of course, you can also filter by base model so you know what is compatible with your workflow. The first
workflow we used was based on an SD 1.5 model, but I can also sort by other ones like the Fluxdev model or an older one
like SDXL. By the way, SDXL is newer
like SDXL. By the way, SDXL is newer than SD 1.5 and Flux is even newer than SDXL. So, use these buttons to sort
SDXL. So, use these buttons to sort models. If you already know the name,
models. If you already know the name, you can just search for it. For example,
I can search for Juggernaut. Here you
can see multiple versions of that model based on different base models like SDXL or SD1. If I click on SD1.5,
or SD1. If I click on SD1.5, I will only see those versions. If I
click on the one that says Juggernaut, it opens the model info page. At the
top, you can see different versions. We
used the Reborn version, but you can try other versions as well. Below that, you have details about the model. It clearly
says what type it is. It can be a checkpoint allura or a checkpoint merge.
In this case, it is a checkpoint merge, which means the main SD 1.5 model was mixed with other SD 1.5 models to combine the best parts of each one. It
also clearly states that this is a base SD 1.5 model. You can see the publish date as well, which shows that it is quite old. At the top you have the
quite old. At the top you have the download button and the file will go into the correct folder. In this case, it goes into the checkpoints folder. As
I mentioned before, the author sometimes includes recommended settings. You can
see them here. This is how I knew what settings to use in the K sampler for the workflow. At the top, you also have a
workflow. At the top, you also have a gallery with images generated using that model. This helps you understand what
model. This helps you understand what the model is capable of. Some images
also have an info button that shows the prompt and settings used to generate that image. So, explore Civid AI if you
that image. So, explore Civid AI if you have access to it and see what models and loris are available. Once you are signed in, you also get more options to control what type of models are visible
since some are disabled by default. So,
now that we played a little with that old SD 1.5 juggernaut model, it is time to try a better, newer model to see how far AI has come in just 2 years. Let us
go to workflows again and this time open the workflow named 5A, the one for Z image turbo with the all-in-one model.
The workflow is quite similar to the others we tried. We just have two extra nodes this time. One of them is this conditioning node that we use instead of the one for the negative prompt. And the
other one is this model sampling node.
Since we are using a new model, we need to download it because we do not have it yet. The model is called Zimage Turbo.
yet. The model is called Zimage Turbo.
Juggernaut and Zimage Turbo are very different types of models built with different goals in mind. Juggernaut is
based on stable diffusion 1.5.
It uses the classic diffusion architecture that has been used for years. The model file itself is
years. The model file itself is relatively small, usually around 2 GB.
Juggernaut was created by the community by fine-tuning and merging stable diffusion models. Z image Turbo is a
diffusion models. Z image Turbo is a newer type of model created by the Tongi team from Alibaba. It uses a more modern architecture designed to generate images
more efficiently. Even though Zimage
more efficiently. Even though Zimage Turbo is much larger in file size, it is optimized to produce good results in very few steps. One important difference
is how the models understand prompts.
Juggernaut relies on the classic clip text encoder. It understands prompts if
text encoder. It understands prompts if are short, but it often requires careful wording, sometimes key words like prompts. Zimage Turbo uses a more
prompts. Zimage Turbo uses a more advanced text understanding system inspired by large language models. This
allows it to understand prompts in a more semantic and natural way. Because
of this, Zimage Turbo can often follow instructions better, even with shorter or more loosely written prompts. So, in
simple terms, Juggernaut is smaller, very flexible, and highly compatible.
Zimage Turbo is a larger, newer model, and smarter at understanding what you ask for. So, we have here an all-in-one
ask for. So, we have here an all-in-one model. And there are two types, a
model. And there are two types, a smaller one, FP8, and a bigger one, BF-16. It depends on your graphic card.
BF-16. It depends on your graphic card.
If you can run the big one, use that one. For this first episode, I want to
one. For this first episode, I want to run it on a low V RAM card, so I will use the FP8 small version. All-in-one
means it has everything it needs included. The clip and VAE model in this
included. The clip and VAE model in this case, so we do not need to download those models separately. That is why it is easy to use for beginners. The models
go into the checkpoints folder and there we can create a special folder for the Zimage model. Also, if you want to learn
Zimage model. Also, if you want to learn more about the model, I included an info link here. So click on it. Now we are on
link here. So click on it. Now we are on the hugging face page and you can learn more about this specific version from workflows to different model versions.
If we go to files, we can see different model versions that you can try depending on how good your graphic card is. So let us test the small version.
is. So let us test the small version.
Click here. Then go to comfy UI. Go to
models then checkpoints and create a folder called Z image. So everything is more organized inside this folder. Place the
model. Since this is a big model, you need to wait for it to finish downloading. Because Comfy UI was open,
downloading. Because Comfy UI was open, you can see that it does not appear in the list yet, only Juggernaut. So I
press the R key to refresh node definitions. And now we can see both
definitions. And now we can see both models nicely organized in folders.
First is Juggernaut and second is Zimage. So let us select the Z image
Zimage. So let us select the Z image model. That is all for the model
model. That is all for the model download. And now we can run the
download. And now we can run the workflow. The first time you run a
workflow. The first time you run a workflow, it is slower because it needs to load the model. The second time you run it, it should be faster. For me, it took about 10 seconds because I have a
lot of ROM and VROM. The result looks pretty good compared to the robots we used to get with the SD 1.5 model. We
have much nicer details. For the image size, I used a smaller size so it runs faster. This model was trained with
faster. This model was trained with bigger images, not like SD 1.5. So, we
can even use larger sizes like 1,600 pixels if we want. Even if you go bigger than the size it was trained on, it does not produce many errors like SD 1.5 did.
It just becomes a bit more diffused.
Usually, for most newer models, a good place to start is around 1,024 pixels.
So let us say I try a landscape image this time using these sizes. The result
looks pretty good. I like it.
Let us go back to workflows again and open the first workflow to see what is different and how we can recreate the Z image turbo workflow. So we already have the right node to load the model in this
case. So I just select the model from
case. So I just select the model from the list for empty latent. This one is used more for older models with a different architecture. Many newer
different architecture. Many newer models use a different empty latent node. If we look at the nodes and search
node. If we look at the nodes and search for empty, we have one empty latent and one empty SD3 latent. In this case, we want the one with SD3. On the surface,
they look identical. It is just a different latent representation internally. If we make it purple, it
internally. If we make it purple, it looks like the previous one. If you do not have enough VRAM to run this, you can use sizes like 768 for width and
height. I will use 1,024 pixels since it
height. I will use 1,024 pixels since it is the most popular size and my system can handle it. So let us delete the old empty latent. And now reconnect the new
empty latent. And now reconnect the new node. This model does not use a negative
node. This model does not use a negative prompt, only a positive one. So I will remove the negative text. You can also collapse it if you want. That way you
know not to add a negative prompt. Then
we have the settings which as you remember are different from model to model. If we look here, we only have
model. If we look here, we only have five steps. So, it can generate with
five steps. So, it can generate with fewer steps and the CFG is one. Let us
change the steps to five and the CFG to one. When the CFG is one, it ignores the
one. When the CFG is one, it ignores the negative prompt. We also need a sampler
negative prompt. We also need a sampler and auler.
So, let us add the DPM++ SDE sampler.
And for theuler, we use beta. Let us see what else is missing. We have this extra node called model sampling oraflow. It
has a long name. Not sure why it cannot be something simpler, but anyway, let us search for that node.
We change the shift to three and we make the connection go through that node just like we did with Laura. The model
sampling aura flow node is a special node that modifies the model sampling behavior before it goes into the K sampler. It is designed to work with
sampler. It is designed to work with models that use the Auraflow sampling method, which is an advanced sampling technique used by some modern models for better stability and quality. What this
node does is apply a sampling adjustment or patch to the model itself. So, the
sampler works in the best way for that model. The node takes the current model
model. The node takes the current model and a shift value as inputs and outputs a modified version of the model with the Auraflow sampling logic applied. The
shift parameter controls how strong that sampling adjustment is. Changing the
shift value can subtly affect contrast, sharpness, and how the generation behaves internally. So, we changed the
behaves internally. So, we changed the empty latent to the SD3 version. We
added a node to shift the model values and we adjusted the settings to work better with the Z image model that we loaded in our workflow. Let us run it and see if it works. As you can see, it
works just fine and we get a nice robot.
If we look at the previous Z image workflow, you can see that it does not use a negative prompt, but instead it has a conditioning zero out node. So let
us go back to our workflow and search for that conditioning node. As I
mentioned before, this model does not use a negative prompt. So you might wonder why we do not just delete the node. We could do that, but then we
node. We could do that, but then we would have a missing input and the workflow would throw an error. To fix
this, we use the conditioning zero out node. You can make space for it and
node. You can make space for it and place it between nodes if you want. This
conditioning does not come from clip like the negative prompt did before. We
connect it directly to the negative input on the K sampler. You can place it wherever you want to make the connections clearer, but I like to put it under the positive conditioning to
save space. The conditioning zero out
save space. The conditioning zero out node does exactly what its name suggests. It removes the influence of a
suggests. It removes the influence of a conditioning input without breaking the workflow. In simple terms, it takes a
workflow. In simple terms, it takes a conditioning signal, usually text conditioning, and replaces it with a neutral zeroed version. So the model still runs normally, but that
conditioning contributes nothing to the generation. Why this exists and when it
generation. Why this exists and when it is used? In diffusion models, the
is used? In diffusion models, the sampler always expects both positive and negative conditioning inputs. If you
want to remove or disable one side, you cannot just unplug it. That would break the workflow. conditioning zero out is a
the workflow. conditioning zero out is a safe way to say use conditioning but make it have no effect. So if we run the workflow everything works fine without
any errors. Now the good part about Z
any errors. Now the good part about Z image turbo is that it is very good at realistic images but it is also very good at understanding prompts. For
example, if I want to create a portrait of a cat with a hat, I can easily get an image like this. But you can also create more complex prompts by using a large
language model. Maybe you use chat GPT,
language model. Maybe you use chat GPT, Gemini, or even a local LLM. I will use chat GPT for this example. I ask it for a detailed photo prompt and give it the
details of what I want. Chat GPT then gives me a long detailed prompt that I can copy and paste directly into Comfy UI. So, let us test it again. Now, we
UI. So, let us test it again. Now, we
get a different cat, but it is still a bit too simple. Let us make it more complex. I go back to chat GPT and ask
complex. I go back to chat GPT and ask for the cat to hold a rose in her mouth and wear a t-shirt that says Pixa.
Again, we get a long detailed prompt.
And from that prompt, we get this image.
Sometimes the model can take things very literally. So, you need to explain
literally. So, you need to explain details clearly if you want more control. For example, you might need to
control. For example, you might need to mention that you want a full rose held horizontally in the mouth and not something else. Let us create something
something else. Let us create something different now. This time a cartoon bunny
different now. This time a cartoon bunny since this series is full of bunnies.
Anyway again we get a nice prompt and the result looks like this. It is pretty cute. Maybe now I want the bunny to be a
cute. Maybe now I want the bunny to be a ninja. Let us see what this prompt
ninja. Let us see what this prompt generates. And we get our ninja bunny.
generates. And we get our ninja bunny.
If we generate again we get another one.
As you can see compared to older models the results with different seeds are quite similar. You do not get a huge
quite similar. You do not get a huge variation from seed to seed. That is why I recommend using longer prompts and adjusting each prompt carefully. This
model is very good at following instructions. So the more precise you
instructions. So the more precise you are, the more control you get over the result. Let us open the first workflow
result. Let us open the first workflow again so we can compare it with workflow 5A. Now let us say I use the same long
5A. Now let us say I use the same long prompt and the same fixed seed for both workflows. If I generate with Z image, I
workflows. If I generate with Z image, I get a robot like this one which looks nice and detailed. Now if we try the old juggernaut model using the same prompt
and the same fixed seed, the result looks like this. It is smaller and much less detailed. Let us copy this image
less detailed. Let us copy this image and paste it into this workflow. So you
can clearly see the difference in quality and also how well the image follows the prompt. But maybe this single test is not enough to fully see the difference. So let us try something
the difference. So let us try something else. Let us test text generation. Newer
else. Let us test text generation. Newer
models can generate readable text, but older models usually cannot. We normally
put the text we want inside quotes. So,
let us test that. Look at this result.
It looks very good. And it understood the assignment.
Now, let us go back to the juggernaut model and use the same prompt. We get
something like this. What is this? What
does it even say? Gold gola or something like that. It clearly cannot do text.
like that. It clearly cannot do text.
Let us go back to Z image and try another test. A red sphere on top of a
another test. A red sphere on top of a green cube placed on a black car.
We get this realistic result. Z image is more specialized in realism, but it can also do 3D paintings and other styles.
Now let us see what Juggernaut does with the same prompt. It gets the red sphere since that was mentioned first and then it gets lost and forgets what it needs to do next. So clearly Z image is a very
good model to have and you will probably spend more time playing with this model.
Still keep an eye on new models because they keep getting smarter and better as they get more training. You have now seen how an all-in-one model works and how we load checkpoints. In the next
chapter, we will use models that are split where clip and VAE are loaded separately so we can have more control.
Let us talk a little bit more about diffusion models. Open Comfy UI and then
diffusion models. Open Comfy UI and then open workflow 5A and also workflow 5B so we can compare them.
In the first workflow, Z image is loaded as an AIO model. AIO means all-in-one.
You can see that we used a load checkpoint node to load that model. The
checkpoint already contains the diffusion model, the text encoder, the VAE. Everything is bundled into a single
VAE. Everything is bundled into a single file. Advantages:
file. Advantages: Very easy to use, fewer nodes, less setup required. Good for quick testing
setup required. Good for quick testing and simple workflows. Disadvantages:
Less flexible. You cannot swap the text encoder. You cannot change the VAE.
encoder. You cannot change the VAE.
Harder to customize or optimize. This
format is designed for simplicity and convenience. Now, let us check the
convenience. Now, let us check the second workflow, the 5B version. You can
see that we have three nodes now instead of one. We have the load diffusion model
of one. We have the load diffusion model node that loads the main model. Then we
have the clip load node that loads the text encoder. And then we have the load
text encoder. And then we have the load VAE node that loads the VAE. So it is like we split the previous checkpoint into separate pieces. And now we have
more flexibility. Even though the final
more flexibility. Even though the final result is still Z image turbo, the pipeline is modular. Advantages more
control. You can change the text encoder and experiment with different VAE.
Better for advanced workflows and optimization and easier to update individual components. Disadvantages,
individual components. Disadvantages, more complex setup, more nodes, higher chance of misconfiguration if you do not fully understand what each part does.
However, this is actually one of my favorite workflows. The reason is
favorite workflows. The reason is flexibility and efficiency. With a
modular setup like this, you save disk space. For example, this VAE is the same
space. For example, this VAE is the same VAE used by the Flux model. So, if I already use Flux, I do not need to download the VAE again. With an
all-in-one model, every new version means downloading the entire model again, even if only one part changed. In
a modular setup, I can update or swap individual components. I can test
individual components. I can test different text encoders without downloading the main diffusion model again. So while modular workflows
again. So while modular workflows require more understanding, they are more efficient, more flexible, and better for experimentation.
That is why I personally prefer this approach. But we did not download these
approach. But we did not download these models yet. I suggest that when you
models yet. I suggest that when you follow this tutorial, you test everything to see what is better or faster on your computer and then keep only the ones you like. There is no
point in keeping all types of models if they do the same thing unless you have a lot of space on your hard disk. So let
us start with the main diffusion model.
This long name is actually describing how the model is built and optimized. Z
image turbo. This is the model family and architecture. FP8. This means the
and architecture. FP8. This means the model uses 8 bits floatingoint precision. FP8 models use much less
precision. FP8 models use much less memory than FP16.
I did not include a link in this tutorial for the FP16 version, but you can find those online if you have more VRAM and want to try them. Scaled refers
to the FP8 format being calibrated for better precision. This improves quality
better precision. This improves quality and stability compared to a raw unscaled FP8 format. You can think of it as FP8
FP8 format. You can think of it as FP8 with tuning for better accuracy. E5M2.
This is the specific FP8 encoding variant used. KJ. It is usually a
variant used. KJ. It is usually a variant tag or builder ID added by the person or team that exported or repackaged the model. It does not change
the model itself. It just helps distinguish between different builds.
Safe tensors. This is the file format.
Safe tensors is a safe and efficient format and is recommended over older formats like CKPT for better stability and speed. We can download this model
and speed. We can download this model from here. And I also added more info
from here. And I also added more info about the model so you can check different versions. You can also see the
different versions. You can also see the author. So now you know what that KJ in
author. So now you know what that KJ in the model name stands for. So let us click here and see where we place it.
Navigate to the comfy UI models folder.
You should already know this by now.
This time we do not use the checkpoints folder because that is usually for complete models that already include most of what they need. Instead we place this one in the diffusion models folder.
To keep things organized, we create a folder called Zimage and place the model inside. Next, we have the text encoder.
inside. Next, we have the text encoder.
I used one recommended by ASD from the Discord server, but there are other text encoders you can try made by different people. For this one, we again go to the
people. For this one, we again go to the models folder and this time we place it in the text encoders folder. Here I do not create a Zimage subfolder because
many text encoders work with multiple models. I usually create subfolders only
models. I usually create subfolders only for main models Laura and controlN net when it is important for the workflow that they match the same base model.
Then we have the VAE. This is the same VAE that we might also use later for the flux model. So again we go to the models
flux model. So again we go to the models folder and this time we place it in the VAE folder. Some of these models are
VAE folder. Some of these models are large so wait for them to finish downloading. Once everything is done,
downloading. Once everything is done, press the R key to refresh the node definitions. Now let us check that all
definitions. Now let us check that all models are visible and selected correctly. The Z image diffusion model
correctly. The Z image diffusion model is here. The clip text encoder is here
is here. The clip text encoder is here and the VAE is also here. That means we have everything we need to run this workflow. So let us click run and see if
workflow. So let us click run and see if we get any errors. Everything works fine and we get this image. What I usually do next is compare the results with the first workflow. When I have multiple
first workflow. When I have multiple models available, I download all of them, test them, keep the ones I like the most, and delete the rest. When I do testing, it can get confusing which
model generated which image. So, here's
a small trick. Double click on the canvas, search for it tools ad, and select the node called IT tools add text overlay. This node comes with easy
overlay. This node comes with easy installer, but if you have a different Comfy UI version, you can install the I tools nodes from the manager. We add
this node right after VAED code and before the save image node. This way,
the final image goes through this node.
The text overlay is added and then the image is saved to disk. For example, we can add the model type in the text overlay. You can also add more text like
overlay. You can also add more text like the model name or other info. Let us say I add FP8 scale diffusion. So I know this image comes from this workflow. Now
when I run it, text will be added on top of the image. We can control the text, the background color, the font size, and whether the text overlays the image or
is placed under it. Let us disable overlay mode and try again. Now the text is under the image. This way we know exactly which model generated it. Next,
I select this node and press Ctrl + C to copy it. Then I go to the first workflow
copy it. Then I go to the first workflow and press Ctrl +V to paste it. Now we
connect the node the same way as before.
We need a name that represents this model. So let us name this one FP8
model. So let us name this one FP8 all-in-one.
Now I can test it and you can see the text under the image. To make a fair comparison, we use the same settings for both workflows. Let us also enable the
both workflows. Let us also enable the bottom panel to see how much time it takes to generate. As you remember, the first time you run a model, it is slower because it loads the model. We can
unload the models using this button and clear the cache using this one. This
lets us compare which model loads faster and which one generates faster. I run it once and you can see the first run took around 8 seconds. The second and third
runs are faster around 3.57 seconds. Now
let us go to the second workflow.
I unload the models and clear the cache.
Then run it again a few times.
This one loads slower, but the second and third runs are faster. On my older PC, the all-in-one model was faster, so it really depends on your system. Test
it yourself and see what works best for you. Now, let us look at quality. We use
you. Now, let us look at quality. We use
a fixed seed with a value of 50 and run the workflow. Then we do the same for
the workflow. Then we do the same for the first workflow, same fixed seed and run it.
Right click on the image result and copy the image. Then create a new workflow
the image. Then create a new workflow where we compare the two images. I press
Ctrl +V to paste the image and you will see it adds a load image node with that pasted image. I do the same for the
pasted image. I do the same for the image from the second workflow. Now we
have both results. Let us add an image compar node so we can compare them.
Connect the first image to image A and the second image to image B. Then run
the workflow. Now we can enlarge and compare them. The results are quite
compare them. The results are quite similar but still slightly different.
This happens because I used a text encoder that is different from the one included in the all-in-one workflow. If
I had used the same text encoder, the results would have been much more similar. Let us try again with a
similar. Let us try again with a different prompt. Maybe we do a portrait
different prompt. Maybe we do a portrait photo of an old woman. We get a result like this one. Now let us do the same for the second workflow and we get another woman for this one. Let us copy
both images and go to the compare workflow. Select the load image node and
workflow. Select the load image node and use Ctrl +V to paste the image into that node. Now we can compare the two
node. Now we can compare the two results. Again, because the text encoder
results. Again, because the text encoder is different, the comparison is a bit harder. Still, I kind of like the FP8
harder. Still, I kind of like the FP8 scaled version more. You can see that we use the same settings for both workflows. One has everything included
workflows. One has everything included and the other has everything separated.
If I searched for the same clip used in the AIO workflow, I could get much closer results. This load clip node is
closer results. This load clip node is something we will use in other workflows as well. As you can see, it has a type
as well. As you can see, it has a type option that lets you select different types of models to match the diffusion model you loaded. Do not stress too much if you do not understand everything yet.
It will make more sense as you practice.
If we look at the VAE, you can see where it goes. As you remember, we use it to
it goes. As you remember, we use it to connect to nodes like VAE decode and VAE encode. If we go back to the first
encode. If we go back to the first workflow, that VAE is coming directly from the one included with the main model. Different model formats do not
model. Different model formats do not change what an AI knows. They only
change how that knowledge is stored. So
choosing the right format is about balancing quality, speed, memory, and flexibility for your hardware and
workflow. GGUF stands for GPT generated
workflow. GGUF stands for GPT generated unified format and it is a model format designed to run large models efficiently on systems with limited memory. In Comfy
UI, let us go to workflows again and this time open workflow 5B and 5C so we can compare them. The workflow we saw in the previous chapter had three nodes.
Load diffusion model, load clip, and load VAE. And you can see that it was
load VAE. And you can see that it was loading safe tensors files. If we go to the GGUF workflow, you can see that some nodes are different. We now have a unit
loader that has GGUF in the name and the file format is GGUF instead of safe tensors. We use this node to load GGUF
tensors. We use this node to load GGUF type diffusion models. For the clip, we could have used the previous node to load an existing text encoder, but I wanted to show that you can also use a
clip loader GGUF node to load text encoders in GGUF format. The last node is the same load VAE as before. So,
compared to the previous workflow, we only changed two nodes so we can load GGUF models, but we do not have those models yet. So let us go to the notes
models yet. So let us go to the notes and check which node loads which model and also look at the download links. If
we go here you can see there are many GGUF model versions. Most of the time you will see something with a Q version
in the name like Q2, Q4, Q6 or Q8. Most
of the time I use Q8 models. If that is too big, I switch to Q6. And if that is still too big, I use Q4. The lower the Q
number, the lower the quality of the generation, but the models are smaller and can be faster on limited hardware.
Let us look at what this model name means. Zimage Turbo. This is the core
means. Zimage Turbo. This is the core model family and variant name Q4. This
means the model is quantized to four bits precision. Lower bit quantization
bits precision. Lower bit quantization reduces file size and VRAMm usage. The K
indicates a specific quantization method, usually a blockbased or KQ quant method, which helps preserve model accuracy even at low bit precision. The
S usually means small or standard variant within that quantization type.
It trades a bit more quality for a smaller footprint compared to M versions which stand for medium. GGUF. This is
the file format. Let us download this model and give it a try. We go to the Comfy UI folder, then to the models folder. This main model, just like in
folder. This main model, just like in the previous workflow, goes into the diffusion models folder. Since we
already have the Z image folder from the previous chapter, I will place it in the same folder because it is the same base model just a different quantization. So,
we save it there. Now, let us do the same for the text encoder. We click here to download it. Then again go to the models folder, find the text encoders
folder and place the model there. For
the VAE, if you followed the previous chapters, you should already have it. If
not, download it and place it in the V folder. These models are big, so wait
folder. These models are big, so wait for them to finish downloading. After
the download is finished, press the R key to refresh node definitions so Comfy UI can see the new models. Now we go back to the nodes and make sure we can
select the models. The Z image model is there. The text encoder is also there. I
there. The text encoder is also there. I
am using the one with GGUF in the name because if you use a safe tensors version here, even if it does not give an error, the results will not be what you expect. For the VAE, we already have
you expect. For the VAE, we already have it. So now we have everything we need.
it. So now we have everything we need.
Let us run the workflow. The result
looks pretty good for a Q4 version. Let
us open the bottom panel and run it again. You can see that the first time
again. You can see that the first time it loads the model, it is slower, but after that it takes around 5 seconds to generate. In my case, this was slower
generate. In my case, this was slower than the all-in-one model or the FP8 scaled version. That does not mean it
scaled version. That does not mean it will be the same on your system. On some
systems, it might be faster. That is why I keep saying you should test everything and then keep the best model for your setup. What is best for me will not
setup. What is best for me will not necessarily be best for you because we have different video cards, different VRAM amounts and probably different
drivers. Now I am curious how a larger Q
drivers. Now I am curious how a larger Q version will perform. So let us go back to the model list. This time I want to test a bigger one. The biggest available
here is Q8 which is around 7 GB in size.
I have 24 GB of VRAM so I can easily fit this model in memory and even larger ones. Sometimes if a model is larger
ones. Sometimes if a model is larger than your available VRAM, it will be slower because it tries to load the model in parts. You lose time during that process and generation can be slow
or it can even crash CompuI and force you to restart it. So let us download this one. We place it in the diffusion
this one. We place it in the diffusion models folder inside the Z image folder.
right next to the Q4 version. Again,
wait for it to finish downloading.
Luckily, I have a fast internet connection. After that, press R to
connection. After that, press R to refresh. So now in the unit loader, we
refresh. So now in the unit loader, we can see both models. By the way, UNET is the main neural network inside a diffusion model that predicts what noise
should be removed at each step to turn random noise into an image. First let us change the seed to fixed so we can compare the models properly. I get this
image for Q4. I copy the image, create a new workflow and paste it there. I will
rename the node to Q4 so I know which model was used. Now let us go back to the workflow and select the Q8 model.
Everything stays identical. Only the
model changes. Let us see what we get.
It looks similar at first glance. I copy
this image, go back to the new workflow, and paste it there as well. I rename
this one to Q8.
At first glance, the Q8 version seems to have fewer mistakes and looks clearer.
Let us add an image compar node to compare them properly.
Connect the two load image nodes to the image compar.
The first image shown is image A, which is the Q4 version. As we move the cursor to the right, we see the Q8 version. In
my opinion, Q8 has better details and fewer errors. For example, some bolts
fewer errors. For example, some bolts seem to be missing in the Q4 version, while the Q8 version looks more complete. In most cases, Q8 will be
complete. In most cases, Q8 will be better than Q6 and better than Q4 in terms of quality. But now, let us check the speed.
The first time Q8 took longer to load because it is a 7 GBTE model. Let us
change the seed and try again. Now the
second run takes under 4 seconds. Let us
try once more. We change the seed again and once more it takes under 4 seconds.
Now let us switch back to the Q4 model.
This one is lower quality but also smaller. You can see that the first time
smaller. You can see that the first time it loads faster. Let us change the seed and try again. The second run takes more than 5 seconds. Let us try one more
time. And again, it takes more than 5
time. And again, it takes more than 5 seconds. This is why I keep saying you
seconds. This is why I keep saying you should test all of them and then decide.
For me, Q8 is faster and gives better quality than Q4, but that is because my video card probably works better with that quantization on your system.
Especially if you have an older card, it might be the opposite and Q4 could be faster. So please test them yourself and
faster. So please test them yourself and then keep the one that gives you the best quality and the best speed on your system.
So let's go to workflow again. And now
it might make sense why I named workflow 5 all these three workflows because they are workflows for the same model just different model types. So let's open
workflow 5a and you will see how we can adapt the workflow. Let's move this to the side. So, this has an all-in-one
the side. So, this has an all-in-one model with everything included. We want
to change it into a workflow where the models are split. So, let's start with the model. Instead of load checkpoint,
the model. Instead of load checkpoint, we search for load diffusion model. This
one only loads the model without clip and VAE. And we select the Z image model
and VAE. And we select the Z image model from the list. Then, we need a node that has that clip output that loads the text encoder. So, we search now for a node
encoder. So, we search now for a node called load clip. Let's make it bigger so we can see the parameters.
We first select the text encoder. Then
we select the type. Z image uses luminina 2. Lumina means light. You can
luminina 2. Lumina means light. You can
think of it like reaching the end of a tunnel. Z is the last letter of the
tunnel. Z is the last letter of the alphabet and at the end you see the light. Luminina 2 represents a newer,
light. Luminina 2 represents a newer, clearer way for the model to understand prompts and guide image generation. It
simply means more advanced guidance compared to older models. Then what is left is the VAE. So, we use the load VAE node and we select that VAE model. So,
now all that's left to do is to redo the connections. We drag a link from model
connections. We drag a link from model to model. Pretty easy, right? Now, we
to model. Pretty easy, right? Now, we
need the clip. So, let's drag another link. And all that's left is the VAE.
link. And all that's left is the VAE.
This one will connect to the VAE decode node. And if the workflow is image to
node. And if the workflow is image to image, it will go to VAE end code also.
Now, we can get rid of the load checkpoint node. So now we successfully
checkpoint node. So now we successfully replaced all the models and basically we have the workflow version 5B that we used before. So let's run it and it all
used before. So let's run it and it all works okay as it should. Let's say the model we use now is too big and our video card doesn't have enough VRAM.
Then we can try a GGUF model to see if it works faster or better. So let's
search for a node again and this time search for the unit loader, the one that has GGUF in the name. So in this node we can select a GGUF model. You can see I
downloaded two versions before. So let's
say Q4 is smaller in size than the FP8 version in this case. So it has better chances to run faster than a bigger model. But as you saw before on my
model. But as you saw before on my computer Q8 was faster. So maybe I will use that to get better quality instead.
Let's connect the model. And now we can remove that node. So we replaced an FP8 safe tensors model with a GGUF version and if we run this workflow you can see
it works just fine and we got a nice result. If for some reason you are not
result. If for some reason you are not happy with the text encoder maybe it is not so accurate or it is too big we can try a GGUF version of the text encoder
also. So let's delete that node and
also. So let's delete that node and let's search for clip loader the gguf version. You can see it has clip loader
version. You can see it has clip loader in one word. So now we can select the GGUF model. And of course we need to
GGUF model. And of course we need to adapt the type since it is not stable diffusion. It is luminina 2 instead.
diffusion. It is luminina 2 instead.
Remember that light at the end of the tunnel and then link the clip to text encode prompt. And basically now we have
encode prompt. And basically now we have the workflow 5C. So you saw that having a modular version allows you to change models and have more freedom just like
on your computer. If you're not happy with a mouse or your printer, you can change it with a smarter or faster version. Now, if you do have enough
version. Now, if you do have enough VRAMm, you can try to increase the size for width and height to get more details. For example, at this size, I
details. For example, at this size, I got this image and now we can see more details on those cables and overall. But
usually for Z image, I use values between 10, 24, and 1280 pixels. So, at
the moment of this recording, Zimage is a pretty good model to have. It is free and you can generate all kinds of stuff with it. Let's compare a few of these
with it. Let's compare a few of these models to see what the difference is.
So, for this one, I compared the FP8 all-in-one version with the FP16 all-in-one version, which is double the size. The results are quite similar.
size. The results are quite similar.
Maybe the FP8 is a little more desaturated compared to FP16 and FP16 might be a little bit clearer, but it is not a huge difference. Both are good
quality. For the Viking image, the FP8
quality. For the Viking image, the FP8 version has fewer details in some areas.
In FP16, it added some extra things like more ornaments. Again, FP8 looks a
more ornaments. Again, FP8 looks a little more desaturated. For the bunny, both look good. So, I would say if FP8 is faster, has half the size, and the
results are very close, you can get away with FP8 and keep that. Now, let's
compare the FP8 version with the FP8 scaled version. Keep in mind that the
scaled version. Keep in mind that the text encoder is also different in this case compared to the one included in the first model, but the results are still quite similar. Sometimes FP8 does it
quite similar. Sometimes FP8 does it better, sometimes the scaled version does it better. So if you do more tests, you can decide which one is better for you. Since the results are very close,
you. Since the results are very close, again, it makes sense to keep the one that is faster on your system. Now,
let's compare the FP8 version with the GGUF version. Instead of the Q4 version,
GGUF version. Instead of the Q4 version, I will use the Q8 version downloaded from here so we can see the difference.
For the portrait, it looks a little clearer on the Q8 version. For the
Viking, some details are more defined on the Q8 version. For the Bunny, it is pretty similar for me. For other models we will test in the future, the difference might be bigger and more
obvious, but in this case, the difference is quite subtle. So, which
one will I keep for my video card? Maybe
the FP8 scaled or the Q8 version, mainly because they are modular and I can save space and time when I use the same models for other workflows. In this
chapter, we explore batch generation and styles. So, let's open another workflow.
styles. So, let's open another workflow.
this time workflow 5A since it has fewer nodes and you can see things better. But
the methods I show work with any workflow. Right now, each time you press
workflow. Right now, each time you press the run button, the workflow runs once and you get a single image. But what if you want more images and you do not want
to click run every time? In this node, we have an option for batch. By default,
it is set to one. You can change that, but keep in mind it will use more VRAMm because it is like running multiple workflows at the same time. If your
video card can handle it, it will be faster than generating one image at a time. So now we get two images. If I
time. So now we get two images. If I
change it to four and run again, we get four images. If we toggle the bottom
four images. If we toggle the bottom panel, we can see the time it took for one image, for two images, and for four images. If we multiply 3.77,
images. If we multiply 3.77, which is the time for one generation, by four, we get over 15 seconds. But
because we used batch, it only took 13 seconds. So, you need to see what batch
seconds. So, you need to see what batch size works best for your video card. I
might be able to use a bigger batch, but you might need a smaller one. Now, from
these four images, we can click on any of them to open it bigger. These images
are saved in the output folder as well.
To close the big preview, you can use the X in the top right. Let's open
another one. You can also navigate using the buttons in the bottom right corner, so you can check all generations. The
bigger the image, the more VRAMm it will need. Let's say I set the batch to
need. Let's say I set the batch to eight. Since the image size is quite
eight. Since the image size is quite small, the result is eight images. You
can check the results and pick your favorite. You can rightclick on an image
favorite. You can rightclick on an image and save it in any folder you want. Now
let's change the batch back to one. So
we only get one image. Next to run, we have an arrow that shows multiple options. Here we also have batch count.
options. Here we also have batch count.
This is not the same as the batch we used before. Think of this like a
used before. Think of this like a counter where you tell it how many times to run the workflow. So if I set the value to four and hit run, it will run
once, then again, and again until it has run four times. This is a bit slower than the previous batch method, but it uses less VRAM. If we add these values,
we get over 14 seconds. With the batch and empty latent, we got 13 seconds. You
might say 1 second is not much, but if you use bigger images and longer workflows, seconds can quickly turn into minutes. Let's change the batch back to
minutes. Let's change the batch back to one. And let's explore more run options.
one. And let's explore more run options.
Run on change will run the workflow when we change a value. So if I change the seed, it will start running. It should
stop after the run. But I am not sure if this is a bug or if this is how it is supposed to work. Because the seed is random, it keeps generating continuously after the first change. But if the seed
is fixed, it only runs the workflow when I make a change and then it stops. So I
will stop it manually by switching back to run. If I change it to run instant,
to run. If I change it to run instant, it will generate forever until you stop it. So do not forget to stop it by going
it. So do not forget to stop it by going to the arrow and selecting run. After it
finishes that workflow, it will stop.
But what if we want to run multiple prompts? Until now, we only had one
prompts? Until now, we only had one prompt and the seed was different. But
for the Zimage model, for example, the seed variation is not that big compared to other models like flux or stable diffusion. So let's search for a node
diffusion. So let's search for a node called I tools line loader. This node
loads each line as a prompt. If we drag a link from this line loader output and connect it to the text encoder, a small dot will appear in the top left corner
of that text input. By default, you have three prompts here, cat, dog, and bunny.
Let's say the first prompt is a cat photo. The second prompt is a bunny with
photo. The second prompt is a bunny with a flower. And the third prompt is a lion
a flower. And the third prompt is a lion logo. We have a seed here that decides
logo. We have a seed here that decides which prompt will generate. And we also have control after generation. Randomize
means that after each generation the seed will change to a different random value. So let's run it. We got a cat and
value. So let's run it. We got a cat and now we have a different seed.
For this seed we got a bunny. Now let's
change the seed to fixed so we can understand better how this works. For
the seed we put zero. In computer
programming lists usually start with zero not with one. So instead of 1 2 3, it is 0 1 and 2. So 0 corresponds to the
cat prompt. If I run the workflow, I get
cat prompt. If I run the workflow, I get a photo of a cat. If I change the seed to one, it corresponds to the second prompt, which is the bunny. So the
result is a bunny. And for seed 2, we get a lion logo. Now we know the order in which this node uses the prompts. Can
you guess what we will get for seed 3?
It will start over with the first prompt. So the result will be a cat
prompt. So the result will be a cat photo. Let's add another prompt like a
photo. Let's add another prompt like a rose and maybe a house with a car in front. I will start with zero so it
front. I will start with zero so it starts with the first prompt. Then for
control after generate I will use increment so it starts with the first prompt and continues with the next one and so on. This way it is more controlled and not random. Now that we
have five prompts I can change the batch to five so it runs the workflow five times. You can see it will generate all
times. You can see it will generate all those images one prompt at a time in order and it will stop after five generations.
You can also put 10 if you want to get two generations for each prompt or you can let it run continuously and stop it when you get something you like. We saw
how we can load prompts line by line, but we can also load prompts from a text file. Let's search for it tools prompt
file. Let's search for it tools prompt loader. As the name says, this node
loader. As the name says, this node loads prompts from a file. Let's drag a link from it to the positive prompt. You
can see here it says file path. We
already have an example with a prompts.txt file. So let's run it. It
prompts.txt file. So let's run it. It
will pick a random prompt from that file. And the result is this cat. Now
file. And the result is this cat. Now
let's find that prompts text file. Let's
go to the Comfy UI folder. Then go to custom nodes. And here look for the
custom nodes. And here look for the Comfy UI tools folder. These are all the files used by that node. Basically, the
note itself looks like this. If we go to the examples folder, we have a text file with prompts. Let's open it. You can see
with prompts. Let's open it. You can see we have a few prompts here. And the
image that was generated corresponds to one of these prompts. You can delete everything and add your own prompts here one by one. Or you can ask chatgpt to generate a bunch of prompts. So maybe I
will add one for a dog, maybe one for a cat, and one for a rose. Now I can save that file and close it.
Let's go back to Comfy UI and generate again. It should pick prompts from the
again. It should pick prompts from the same text file. Now we got that cat playing with a mouse. Let me remove this node and try again to see if we get another prompt. And now we got a rose.
another prompt. And now we got a rose.
If the text file is in a different location, you just add the path to that file here so it knows where to load it from. And of course, you can change to
from. And of course, you can change to run instant and let it run, then stop it when you have enough images generated.
Let's delete this node and I will show you more things you can do. Let's search
for a node called it tools prompt styler and select this one. This node picks prompts or art styles from a file. We
have positive and negative prompts. But
since our workflow only uses the positive prompt, we drag a link from there and connect it to the positive prompt input. Here we have an area where
prompt input. Here we have an area where we can type our prompt. Let's say I type a white bunny holding a rose. Then we
can select the style file. These files
that contain different prompts are stored locally. Let's go to the custom
stored locally. Let's go to the custom nodes folder again, then to eye tools, and this time go to styles. You can see here a few example style files, which
are actually YAML files. If we go to more examples, we have even more. Now,
if we look back here at the file list, we can see exactly those files. Among
them, there is one called Pixaroma. I
asked the creator to add my file there so you can access it easily. Thanks,
Mikotti. Once you have the file selected, you can choose a template from that file. You can see here different
that file. You can see here different templates. For example, I can select a
templates. For example, I can select a 3D icon or something else. What is
important to remember is that you select the file first and then the template inside it. For example, let's open one
inside it. For example, let's open one of these files with Notepad so we can see what is inside. Each template looks like this.
You have the template title, the negative prompt, and the positive prompt. As you saw in our workflow, we
prompt. As you saw in our workflow, we only use the positive prompt this time.
So, it will only pick that part. In the
positive prompt, you can see the word prompt inside brackets. That is where it takes your prompt and combines it with the rest of the template prompt. So, if
I do not have anything selected here for the template, it will use something like landscape photography of the prompt. and
instead of the word prompt, it will insert a white bunny holding a rose.
Basically, we recreated what these styles do. This system saves you time by
styles do. This system saves you time by letting you write a short prompt and combine it with a ready-made prompt from a template. This was created back in the
a template. This was created back in the days for stable diffusion models when we did not have access to AI prompt generators. It still works today with
generators. It still works today with most models that recognize these prompts even though you have much more freedom using a custom prompt made with chat GPT. So if the prompt is a white bunny
GPT. So if the prompt is a white bunny holding a rose and for the file I select the Pixaroma file, then for the template I can filter by landscape and select
photography landscape. Now when I run
photography landscape. Now when I run it, it should combine my bunny prompt with the landscape photography prompt.
And the result is this one. So
everything works quite nicely. Let's
change the template. Let's say I select 3D icon and run the workflow. And we get this 3D icon of a bunny holding a rose.
Now let's try an ancient Egyptian mural.
We run it again and we get this mural.
And you can clearly see our bunny in the image. Let's say I select the RCOO art
image. Let's say I select the RCOO art style. The result is this decorative
style. The result is this decorative style illustration of the bunny. Now
let's open the Pixarroma styles file with Notepad. You can see all the
with Notepad. You can see all the templates and prompts for each style and you can edit them if you want. Just keep
the same format. Otherwise, it will not work. Let's say I want to use the
work. Let's say I want to use the template for surreal toy. If I use it in the workflow, it will take this prompt and replace the word prompt with my
bunny holding a rose. So, let's test it.
From the templates, I search for surreal and select that toy style. Now, let's
run the workflow. And we get this 3D surreal bunny with a rose. Pretty cool.
Let's scroll down and see what else we can use. Let's say afroofuturism art.
can use. Let's say afroofuturism art.
That means it will use that specific prompt. Let's change the style and test
prompt. Let's change the style and test it. And the result is this one. Keep in
it. And the result is this one. Keep in
mind each model will interpret these prompts differently depending on how it was trained. There are over 300 styles
was trained. There are over 300 styles or prompts saved in this file from 3D to art styles, painting, photography, design, all kinds that I use most often.
Let's say I select the vector coloring book page style. Now, when I run it, I get this clean coloring page design. Of
course, if you want it to be more unique, give more information in the prompt, like how the bunny looks, how it is dressed, how the environment looks,
and maybe make it fit your story. Let's
say I want to do a cartoon illustration.
Let's search the list to see if we have something like that. For example, I can select a soft 3D cartoon environment and see what we get. And the result is this
one. Let's search for cute and test this
one. Let's search for cute and test this cute cyberpunk style. and we get this illustration. These are good for
illustration. These are good for discovering art styles you might not have thought to try yet. Now, let's
remove this node and search again. This
time, we look for it tools prompt styler extra. It is called extra because it has
extra. It is called extra because it has slots for multiple files and templates.
Let's connect it to the positive prompt.
For the base file, let's select the Pixarroma file since that one has the most styles. For the second file, I will
most styles. For the second file, I will use the same one. Let's set both to random. So we get a random combination
random. So we get a random combination of two styles. If I run it now, I get something like this. There is no bunny because we added a new node and we did
not add a prompt yet. Let's drag a link from the output called used templates.
This outputs the actual styles that were used. Then search for preview and add a
used. Then search for preview and add a preview as text node. Now when I run it, you can see what styles it combined.
Reflection with fantasy. Now, let's add the prompt, "A white bunny holding a rose," and generate again. This time, it combined propaganda art style with knitting art, something you probably
would not think to combine. Let's select
a third file, again, the Pixar file. For
the third style, let's select random or any other style you want. Now, if we look again, it combined Japanese traditional sticker and fine art. Let's
run it once more and we get this gilded fantasy bunny. Pretty cool. Let's try
fantasy bunny. Pretty cool. Let's try
again. And this time we get a cute minimal line art style. Of course, you can also manually select which styles to combine. For example, let's choose a
combine. For example, let's choose a game asset style combined with low poly.
And for the third one, select Adam Punk.
And the result is this one. You can also run it multiple times to get different seeds. By now, you should start to get
seeds. By now, you should start to get an idea of how styles work. Let's try
one last combination. Change low poly to a steampunk style. We get this image because we used a game asset style. If I
change the game asset to cute cartoon, I get this cute bunny in a steampunk environment. So, create your own styles
environment. So, create your own styles for the things you use most often. or
use chat GPT or other large language models to generate longer prompts that describe exactly what you need. In the
previous chapter, we saw how we can use different prompts to change the style of the image. But if we want a style that
the image. But if we want a style that the model did not learn, we cannot generate that style. For that, we have the Laura files, which add extra information to the main model. We talked
more about this in episode 13. Let's go
to workflows, and this time, let's select workflow number six. the one with Laura in the name. This is a simple Z image text to image workflow. In fact,
if we remove these nodes, we get exactly workflow 5A that we used before. So,
let's undo that. What is different here is this Laura loader model only node which allows us to load a Laura from our computer. I just changed the color to
computer. I just changed the color to blue. That is all. The node with trigger
blue. That is all. The node with trigger words is just a simple note. Again,
revisit chapter 13 for more details. So,
let's go and download Aurora. I created
a Laura for a girl with white hair, and you can download it from here. After
that, navigate to Comfy UI, go to models, and then open the Laura's folder. Here, we already have one from
folder. Here, we already have one from chapter 13, the SD1.5 Laura. Now, we
create a new folder called Zimage. Since
this Laura only works with the Zimage model, and we save the Laura inside that folder. After the Laura is downloaded,
folder. After the Laura is downloaded, press the R key to refresh the node definitions.
Now, if we go to the Laura loader, we can select that Laura. You can see the folder and the Laura name there. Just
like for all other Loris, this is the trigger word that I used when I trained that Laura. I use that in the prompt
that Laura. I use that in the prompt together with more words to describe what I want to generate. Now, when I generate, I get this girl with white hair. The Laura I am using here is a
hair. The Laura I am using here is a character Laura. There are also loris
character Laura. There are also loris for styles, objects, or functional ones that speed things up. This also allows you to keep a character consistent. So
even if you change the prompt and keep the trigger words, you get the same character, which is very useful. There
are many Lauras trained by people online on sites like Hugging Face or Civot AI.
Over time, you can also learn how to train them yourself, either online or locally, if you have enough VRAM. Let's
search for Allora on the Civit AI website. Again, if you are from the UK,
website. Again, if you are from the UK, you will need a VPN to bypass the restrictions they set for your country.
Let's go to models. Then we can filter them. Set time period to all. For model
them. Set time period to all. For model
type, select Laura. For base model, select Z image turbo. Now we should see only Loris compatible with our base model. We can sort by highest rated. We
model. We can sort by highest rated. We
have quite a few here. Let's pick one at random. Maybe this one that lets us
random. Maybe this one that lets us create character design sheets. Now we
are on the Laura page. At the top you can see this Laura is available for different models, but we want the Z image version. We check the type to make
image version. We check the type to make sure it is a Laura and that the base model is correct. We also check if it has trigger words or other settings.
Then we can download it from here. You
must be logged in to download models. We
save the Laura in the same Laura's folder inside the Z image folder. After
the download finishes, press the R key to refresh. Now we should be able to see
to refresh. Now we should be able to see that Laura in the list and select it.
Let's see what else it says about this Laura so we can learn more. Here it
shows the trigger words. I can copy those and paste them in a note so I have them for later. They also give an example prompt showing how to use it.
Let's copy that and paste it into the positive prompt. I will remove the
positive prompt. I will remove the beginning and ending quotes. Now, let's
copy the trigger words and place them here instead of the previous trigger words. We also have a subject. So, let's
words. We also have a subject. So, let's
say a white bunny warrior. For art
style, maybe I add a 3D render style.
The rest looks fine. For the model strength, I will use one. If it is too strong, I can reduce the weight. For the
size, let's make it bigger so we get more details. Now let's run the
more details. Now let's run the workflow. We get this character sheet
workflow. We get this character sheet which is not bad. This could be useful for concept artists to see different angles. Let's run it again with a
angles. Let's run it again with a different seed. And we get this result.
different seed. And we get this result.
Now let's change some things in the prompt. Maybe it is a medieval bunny.
prompt. Maybe it is a medieval bunny.
For art style, I try a vector art style.
Let's run the workflow again. Now we get a different image. This seed does not look that good. So maybe I try another seed to see if I get something better.
Again, not perfect, but at least it gives some ideas. Let's go back to Civid AI and look again at models. You can see there are luras for all kinds of things.
That does not mean all of them are great. They are trained by people like
great. They are trained by people like you and me and shared for free.
Depending on the training, some are very good and some are not so good. Training
is never perfect. If you want to see how much the Laura influences the result, we can test that too. Change the seed to fixed and generate once to see the result. In my case, I got this image.
result. In my case, I got this image.
Now, let's go to the Laura loader node.
Right click on it and select bypass.
Then, we run the workflow again with the same settings and prompt. You can see that without the Laura, we do not get a character sheet anymore. So, this Laura
clearly helps with creating multiple characters on a sheet. Remember, Allora
is like an add-on to the main model. It
adds extra training to that model, like the model took a new course and learned how to do character sheets. Hope that
helps. I explained controlNet basics in chapter 14, but there are models like Zimage that need different nodes to run control net. Let's go to workflows. And
control net. Let's go to workflows. And
now I want to open workflow 4 and also workflow 7 since both are using controlnet. And you can see the
controlnet. And you can see the difference. So let's go to the
difference. So let's go to the juggernaut workflow which is a stable diffusion 1.5 model. Here we use a load control net model node and it is the
same node used for SDXL models or flux models. Then for control net we have
models. Then for control net we have different models like depth canny pose or other types that control the image generation. Now if we go to the Z image
generation. Now if we go to the Z image turbo workflow, we have a different node here called model patch loader. Here we
load a control net model and it is called union because it has depth, canny and pose integrated into one single model. So we do not have to keep
model. So we do not have to keep changing the model. It is one model that does everything it needs. Back in the juggernaut workflow, we had a pre-processor node that converted our
image into a format the control net model understands.
For the Z image workflow, that part remains the same. We can try different pre-processors like canny depth or DW pose and they will work with this model.
For the last part in the juggernaut workflow, we had an apply control net node between the prompts and the K sampler with different parameters. For Z
image turbo, the node is different. It
is called Quen image diff control net.
Here we only control the strength which I set to 0.8. So it is not too strong.
Now let's download the required models.
By now you should already have the main models downloaded either FP8, FP16 or BF-16. The principle is the same even if
BF-16. The principle is the same even if you use a GGUF version or other types.
We also need to download the control net model because we are not using the load control net model node but the model patch loader. We need to place this
patch loader. We need to place this model in the model patches folder. So
let's click here to download it. Go to
the Comfy UI folder, then to models, and here you will find the model patches folder. Let's save the model here. Wait
folder. Let's save the model here. Wait
for it to download since it is around 3 GB. Also, keep in mind that over time
GB. Also, keep in mind that over time more versions can appear like version two or three. So, always check if there is a newer version available. After the
download is finished, press the R key to refresh the node definitions so the model appears in the list. Then, select
that model from the drop down. Now we
should have everything we need to run the workflow. I have here in the load
the workflow. I have here in the load image node a robot image loaded. For the
pre-processor I use depth or cany but let's start with canny. For the
resolution I will make it bigger so the cany map has better details. Then for
the prompt we describe what we want to get and then we run the workflow. Now we
can see that we got a cany map that control net understands.
Look at the result. It looks much better and it follows the edges of the original robot. Let's try with a different image.
robot. Let's try with a different image.
As you remember in the input folder, I added some images you can use. So, let's
say I load this sphere and cube image.
For the pre-processor, let's use depth this time. Then, let's adjust the prompt
this time. Then, let's adjust the prompt to fit. Maybe a green sphere on top of a
to fit. Maybe a green sphere on top of a golden cube in the desert, golden hour, alien. Now, when I run the workflow, I
alien. Now, when I run the workflow, I get a depth map for that image. For the
most part, it got it right, except the ground. Let's help it understand what I
ground. Let's help it understand what I want. So, I will add to the prompt, the
want. So, I will add to the prompt, the sphere and cube levitate in air. Let's
run it again and see if it understands it better. Now, we got exactly what we
it better. Now, we got exactly what we asked for. You can also give the image
asked for. You can also give the image to chat GPT and ask for a prompt together with instructions on how you want it to look. Let's try something else.
This time, let's upload that woman in a yoga pose that the juggernaut model struggled with to see how much the model advanced in the last 2 years. For the
pre-processor, I use the DW pose pre-processor. For the prompt, I will
pre-processor. For the prompt, I will add a photo of a woman dressed in white doing yoga on top of a mountain. Maybe I
add photo taken with a DSLR camera. Not
sure if it will take that too literally.
So now we got our pose skeleton which looks correct. We also got an Asian
looks correct. We also got an Asian woman which Z image tends to generate when you do not specify what kind of woman it is. It also added a DSLR camera
on the ground which I do not want. So
let's go back to the prompt. I remove
the DSLR part and for the woman I add that she is European. Now let's test again. The result is actually great.
again. The result is actually great.
Same pose, the clothes I asked for and on the mountains. a perfect result. What
do you think? Let's try to recreate this controlN net workflow so you can practice. Go to workflows and let's open
practice. Go to workflows and let's open workflow 5a since this is a simple textto image workflow for the z image model. Search for quen image written as
model. Search for quen image written as one word. Then select the diff synth
one word. Then select the diff synth control net node. Now we need to connect this between the model and the k sampler. So let's add the links so
sampler. So let's add the links so everything goes through this node. Let's
see what other inputs we have here. It
says model patch. So let's search for that node. We add the model patch
that node. We add the model patch loader.
And here we select the union control net model. Now we drag a connection from
model. Now we drag a connection from this node to the control net node. We
also need the VAE and we already know where it is in this workflow. So we
connect that as well. All that is left now is an image. To load an image, we use the load image node. So search for that node and add it. If we try to
connect the image directly, it will not work correctly because this model is trained with canny depth and pose. So we
need something to convert the image into those formats. Search for AIO and add
those formats. Search for AIO and add the OX pre-processor node. Now our image goes through this pre-processor. From
the list, we can select one, for example, the depth anything prep-processor. For the resolution, we
prep-processor. For the resolution, we can increase it a bit to get more detail. Now, we connect the output of
detail. Now, we connect the output of this node to the control net node since this is the correct format that control net understands. We can also add a
net understands. We can also add a preview node to see how the processed image looks. All that is left now is to
image looks. All that is left now is to adjust the prompt. Let's say the prompt is a modern house in winter. We can also increase the width and height to get
more details. Now, we are ready to test
more details. Now, we are ready to test the workflow. We can see the depth map
the workflow. We can see the depth map of the building. We can enlarge it to see it better. The result looks like this. It is similar, but not exactly the
this. It is similar, but not exactly the same building shape. You could try a more detailed prompt or a different pre-processor. Let's add an image compar
pre-processor. Let's add an image compar node to see the differences. I want the original image before processing. So, I
connect it to image A. Then, just after the VAE decode, I connect that output to image B. Now let's run the workflow
image B. Now let's run the workflow again and make the image compar node larger. We can see that it shares some
larger. We can see that it shares some building edges with the original image but not all of them. If we want more accuracy, we can change the pre-processor.
Let's select a canny pre-processor instead. Now when we run it, you can see
instead. Now when we run it, you can see it captures all the edges in the canny map. The result should be more accurate.
map. The result should be more accurate.
And this is the result we get. Now we
can see many things in common with the original image. Keep in mind this is
original image. Keep in mind this is controlled mainly by edges. So it will not be exactly the same building. We can
get more control later when we cover edit models like Flux 2, Quinnedit or Nano Banana Pro. Up to now everything we did in Comfy UI happened locally inside
the interface. We loaded models,
the interface. We loaded models, connected nodes, ran workflows, and generated images on our own machine. API
nodes are different. They allow Comfy UI to communicate with external services.
An API is simply a way for one program to talk to another program over the internet. Instead of doing everything
internet. Instead of doing everything locally, we can send data out, let another service process it, and then receive a result back. Think of it like
this. Local nodes are tools on your
this. Local nodes are tools on your desk. API nodes are tools you rent
desk. API nodes are tools you rent remotely. You send instructions and you
remotely. You send instructions and you get results back. In Comfy UI, you can click on the plus to add a new blank workflow. Then double click on the
workflow. Then double click on the canvas and search, for example, for chat GPT. You can see that it says API node
GPT. You can see that it says API node under the node name. Let's select this node. Now, this node looks different
node. Now, this node looks different compared to others. It comes already colored in gold like a VIP version. On
top, it tells you how many credits this node will consume depending on the settings. Those credits change based on
settings. Those credits change based on what you use. Here we have a list of models from OpenAI that are accessible through the API. The API letters stand
for application programming interface.
For example, if we select a big model, it can cost between 2 to 8 credits depending on what you ask from it and how long the answer is. If I change to
chat GPT mini, it is almost zero credits. It is not zero, but it is 0
credits. It is not zero, but it is 0 something. So, it is quite cheap. This
something. So, it is quite cheap. This
node has a string output. So like chat GPT, you ask something and you get a text reply back. Let's drag a link and search for a node that displays text.
Search for preview. And we have this preview as text node that we can add.
Here we will get our reply from the chat GPT model. Let's say I ask it to
GPT model. Let's say I ask it to generate a prompt for a cute cartoon bunny, something 3D. Now when I try to run that, it asks me to sign in if I
want to use the API. We could use this login button or we can cancel and go to the menu then settings. Here we have the user section in the settings and again
we have the sign-in option. Let's click
sign in. If you have a comfy UI account, you can use that or you can simply log in with Google which is a faster option for me. Then you select your Gmail email
for me. Then you select your Gmail email from the list and you will be signed in.
Now you also have the option to log out.
So now we are connected but we need credits to run API nodes. Let's go to credits. Credits are like money. You
credits. Credits are like money. You
basically use real money to buy credits that you can spend on a lot of models that are available through the API in Comfy UI. I have here some credits I
Comfy UI. I have here some credits I bought a while back. I can click on purchase credits and then it asks me how much I want to spend. For example, I have $10 here, but that might be too
much for a beginner to spend on a first try. Let's click on minus to see if we
try. Let's click on minus to see if we can go lower. The minimum you can buy is 1,55 credits using $5. Then you can click continue to payment. Depending on
your country, you have different options to purchase. You can use link, but you
to purchase. You can use link, but you can also choose without link if you do not have one set up. Here you have options to pay with a card or you can use Google pay if you want and you also
have the option to purchase as a business. Back in Comfy UI, I have
business. Back in Comfy UI, I have enough credits to test a few nodes in today's tutorial. Now, when I run the
today's tutorial. Now, when I run the workflow again, this node sends information to the server wherever those are located in the cloud on OpenAI or somewhere else. Depending on the
somewhere else. Depending on the situation, sometimes it is faster, sometimes it is slower. From the
workflow point of view, nothing special is happening. Nodes still connect left
is happening. Nodes still connect left to right. Data still flows through
to right. Data still flows through cables. The only difference is where the
cables. The only difference is where the computation happens. Local nodes use
computation happens. Local nodes use your GPU or CPU. API nodes use someone else's hardware. This has advantages and
else's hardware. This has advantages and disadvantages.
Advantages are that you can use very powerful models that you cannot run locally. You save local VRAMm and system
locally. You save local VRAMm and system resources. Some APIs are faster for
resources. Some APIs are faster for specific tasks. Disadvantages are that
specific tasks. Disadvantages are that you depend on an internet connection.
There may be usage limits and of course it costs credits. It is not free. You
have less control over model internals.
So we got the response from chat GPT and it gave us multiple prompts and suggestions instead of a single prompt.
So let's refine what we asked and tell it to generate a single prompt. Maybe
repeat it once again to reinforce that.
Let's run it again. This time we got a single prompt just like I asked. Now we
can copy the prompt and paste it into another workflow. If we want with this
another workflow. If we want with this node selected I will use controll + c to copy the node then let's go to workflows and open a workflow like this five a
workflow that uses z image turbo which we know likes long prompts I will move this node to the side then controll +v to paste that node to connect this node
we just drag a link to the positive prompt now we have a mix of local models that take the prompt from an API node we can also drag a preview here if we
Let me search for preview as text. Now
we can see what prompt it gave us. I can
rename it prompt so I know this is the prompt. Let's run the workflow. You can
prompt. Let's run the workflow. You can
see it generated a prompt for me. Then
it continues to the next part of the workflow and generates the image. This
can be quite useful. There are free models that can also do this, but we will talk about that in another episode.
Let's change the prompt to be a ninja bunny, maybe in an action pose. Generate
again. We get a new prompt describing that bunny and the result is this image.
There are many API nodes and many options to connect them. Let's go back to the previous workflow where it was just those two nodes. Now let's add a concatenate node, a node that lets you
combine two strings or prompts. Let me
remove this prompt since we want to get the prompt from the concatenate node. I
will use it for string B for now. And
then connect this concatenate node here.
I will add a green color so it looks like a positive prompt node. For the
first part, I write a cute cartoon bunny ninja. For the second part, I write
ninja. For the second part, I write something like, "Use the prompt to generate a single detailed prompt creative. Adapt the prompt to match the
creative. Adapt the prompt to match the prompt style and mood." You can use all kinds of chat GPT formulas here to get exactly what you want. Let's drag a link
to a new preview as text node so we can see the result of the concatenate node.
Maybe I name it prompt, but I might change that later. Still exploring what we can do. When we run it, you can see I forgot to add a separator. So, it just combined the ninja prompt with use the
prompt to generate. In the end, it still understood and generated the prompt.
But let's fix that delimiter and add a comma and a space. Of course, you can split this into multiple nodes and make workflows more complex, one going into
another workflow, and so on. This is the prompt that goes into chat GPT and this is the prompt that comes out of chat GPT, the one we want to use in other
workflows. Now we can run it again. You
workflows. Now we can run it again. You
can see the input prompt to chat GPT is this combined text. I like to use concatenate because I can easily change the first prompt without changing the formula below. So it is easier to edit.
formula below. So it is easier to edit.
The result is this long prompt for the Ninja Bunny.
These two nodes are the same, so I only need one and I remove the other. What we
did here is split a workflow into multiple pieces so we can easily edit the prompt without worrying about the formula. I can quickly change the first
formula. I can quickly change the first prompt, run it again, and get a new prompt. It
is quite easy to use at the cost of a few cents or 0 something credits. I
probably do not need that preview anymore since I know how they are combined. So I will leave just one
combined. So I will leave just one concatenate node, the chat GPT node and the preview of the final prompt. Now
that we have this, we can save it. Hold
control and drag a selection over all nodes. Right click on the canvas, then
nodes. Right click on the canvas, then use save selected as template. Give it a name, maybe chat GPT prompt, so we know it generates prompts. Now we can paste
it into any workflow. Let me open that 5A workflow again.
Since it was already open, I will close it because I do not want the extra nodes. Then open it again fresh with
nodes. Then open it again fresh with default values. I move this to the side.
default values. I move this to the side.
Right click on the canvas. Go to node templates. And now we have that template
templates. And now we have that template there. We can move it wherever we want.
there. We can move it wherever we want.
Remember that the chat GPT prompt comes from this string output here. And we
connect it to any workflow we have to the positive prompt. Now when I run the workflow, chat GPT generates a prompt.
That prompt is used in the Z image workflow and the result is this cute monk cat.
But the chat GPT model is also a vision model which means I can give it an image and it can see what is in that image.
Let me remove this concatenate node.
Let's add a load image node.
I upload this image of a helmet. Now we
can connect this node to where it says images. It says images because you can
images. It says images because you can add multiple images if you use a batch images node, but maybe we explore that in a future episode. For the prompt, let's say something like, give me a
single prompt description for this image. Descriptive prompt. There are
image. Descriptive prompt. There are
more complex formulas, but I am just trying something on the spot. Now, let's
run the workflow. Chat GPT looks at my image, and after a few seconds, it should give me a prompt based on that image. We got this nice long prompt and
image. We got this nice long prompt and the result is this one. It is not perfectly identical, but with a better formula, we can probably get something even closer. It is still pretty close to
even closer. It is still pretty close to what we asked. You can also run the workflow multiple times to get different seeds. Let's see what else we can do.
seeds. Let's see what else we can do.
Let's create a new blank workflow.
Double click on the canvas and search for Nano Banana. We have this first version of Nano Banana that is cheaper.
It is eight credits, so you can probably get something similar for free from Google Gemini. We also have Nano Banana
Google Gemini. We also have Nano Banana Pro, the more powerful model that can do big images. This one costs 28 credits.
big images. This one costs 28 credits.
If we change to 4K size, it will cost 51 credits. Depending on the model, some
credits. Depending on the model, some can cost over 100 credits, so be careful what nodes you use because you can run out of credits pretty fast. Both accept
images, but Nano Banana Pro understands prompts and images better. You can see what model is used for the first Nano Banana. And for Nano Banana Pro, it is
Banana. And for Nano Banana Pro, it is actually called Gemini 3 Pro image.
Let's remove the first node and use this one to generate an image. I add a load image node so we can load an image from disk. Then connect the nodes. Now I
disk. Then connect the nodes. Now I
upload an image. For example, this portrait of a man. Then I add the prompt. We did not talk yet about
prompt. We did not talk yet about editing models like Flux 2, Quenedit, or Nano Banana Pro, but these can be used to edit or modify an image. I could say
change the t-shirt or replace the background or hair color. Let's try
something simple like telling it to change what he wears to a steampunk suit. For resolution for this test, I go
suit. For resolution for this test, I go with 2K since it uses about half the credits. We also have aspect ratio.
credits. We also have aspect ratio.
Instead of auto, I set it to 9 to6, but you should use whatever ratio you need.
When I run it, I get a prompt failed message. Can you guess why? It says it
message. Can you guess why? It says it has no output. That is because we did not save the image. So, let's drag a link and add a save image node. I also
see it has a string output. So, let's
add a text preview node to see what it outputs there.
Now, we can run the workflow again. This
one takes longer, over a minute to generate. You can also check your
generate. You can also check your profile here to see how many credits you have or sign out, manage subscription, and so on. You can also check partner
node pricing. This opens the Comfy UI
node pricing. This opens the Comfy UI website where you can see how much it costs to use any of the models that are not free, the so-called partner nodes.
You have models for images, text, and also a few nodes for video. These
usually need a lot of VRAM and you can generate video even if you do not have that VRAM locally but at a cost. Back in
Comfy UI we got our generation. If we
look at it the result is quite good in 2K size and it is quite similar to the original man. So it is a good model but
original man. So it is a good model but expensive. For the text output we also
expensive. For the text output we also got something like a peak into what the model was thinking. basically a prompt it used to generate that image based on
the small prompt I gave it and the image. You can explore more API node
image. You can explore more API node workflows created by the comfy UI team.
If you go to templates here you have all kinds of workflows but if you want to see the API ones select partner nodes then you can filter them by model if you
know what model you are looking for or just explore random workflows. It does
not cost you anything to open and check a workflow. It only costs credits when
a workflow. It only costs credits when you run it. Let's say I like the preview of this workflow. I click on it and I get the workflow. Let's see what it
uses. We have a load image node. So, it
uses. We have a load image node. So, it
expects an image from our computer. We
have a nanobanana prompt and it says color this image. So, if you upload a sketch, it will color that sketch using the nano banana pro model. By default,
it is set to 1k, but you can change the settings to fit your needs. Let's go
again to templates and check another workflow. Maybe this one with the shoe.
workflow. Maybe this one with the shoe.
This one is more complex. It expects an image of a product like a shoe. Then it
uses a bite dance model which is similar to nano banana but a cheaper version.
Once the image is saved, it goes to different video models. These models
cost around 103 credits each. It looks
like it generates multiple videos from that shoe, depending on the prompt, and then combines all those videos into one final video. A workflow this big takes
final video. A workflow this big takes some time to run and can cost you maybe around 300 credits, so roughly a couple of dollars. I did not do the exact math,
of dollars. I did not do the exact math, but you can spend credits very fast with video models. One important thing to
video models. One important thing to understand is that API nodes do not make Comfy UI cloud-based. Comfy UI is still running locally. You are simply adding
running locally. You are simply adding external steps into your pipeline. From
a mental model point of view, treat API nodes exactly like normal nodes. The
cables do not care where the data comes from. If the output type matches, it
from. If the output type matches, it works. This also means you can mix local
works. This also means you can mix local models, gguf models, diffusion models, and API nodes all in the same workflow.
Comfy UI is a complex application with a lot of nodes created by different people. And we combine all these things
people. And we combine all these things together like Lego pieces. At some point you will get an error either because you forgot to connect a link, you connected
the wrong nodes or you used the wrong models. In this chapter I will try to
models. In this chapter I will try to explain what you can do when that happens because it will happen. If we
look at the workflows we have this workflow with the number zero. The first
one, this one is for help and resources.
And I tried to gather here some information that might help you. Let's
start with resources. The best way to learn Comfy UI is to watch tutorials and to practice. You have here a link to the
to practice. You have here a link to the Pixar YouTube channel, but there are many other YouTubers who do tutorials for Comfy UI. You can search on YouTube for different tutorials. Try to look for
more recent ones because if a tutorial is two years old, most things are probably different. Now, that is one of
probably different. Now, that is one of the reasons I made this new series. On
the top of my YouTube header, I added a link to my Discord. Click on it, then click go to site, and you will get an invitation to join the Pixaroma Discord
server. If for some reason it says
server. If for some reason it says invalid, try a different browser or the mobile application. You click accept
mobile application. You click accept invite and you will land in the welcome channel. I will show you more about how
channel. I will show you more about how to navigate Discord in a minute. In this
note, I also included a link to Discord and some useful info like where the Pixar workflows are and so on. Let's
click on this link and we get to the same invite. It goes to the same server
same invite. It goes to the same server and the same welcome channel. This is my server called Pixaroma, but there are many other servers for Comfy UI. For
example, if you go to the comfy.org or website and then to resources. They also
have a Discord link. It is the same process. You accept the invite and you
process. You accept the invite and you land on their welcome channel. On the
left side, you have the servers you joined like my Pixaroma server or the Comfy UI server. Let's go to Pixaroma and explore a bit more. On the left, we
have different channel names so we stay organized. Each channel has its purpose.
organized. Each channel has its purpose.
There are also categories that contain multiple channels which you can collapse or expand. For example, if I collapse
or expand. For example, if I collapse this category, you might think you cannot find the Pixarroma workflows channel. But if we click on this arrow,
channel. But if we click on this arrow, we expand the category and now we can see the Pixaroma Workflows channel there. For every server you join, check
there. For every server you join, check the rules so you know what you are allowed to do and you do not break the rules and get banned. If your Discord account gets hacked and posts spam in
your name, you might get banned as well.
You can send me a message to remove the ban if you fixed your account and it is not hacked anymore. Here we also have a help channel where you can find what each channel is for and where to post.
Some channels are public, some are private and only for members, and some are public, but only moderators or admins can post like news and updates.
We also have a daily challenge for people who use AI. You can find more info in this channel and you can participate in the challenge in the daily challenge channel. When you see a
number with a red circle, that means someone mentioned you or everyone on the server. For example, when you see that
server. For example, when you see that the news and updates channel has a notification, go check it because I probably posted a new tutorial or shared an update. You can see this post used
an update. You can see this post used everyone to mention everyone on the server. In off topic, you can discuss
server. In off topic, you can discuss things that do not fit in Comfy UI or other channels, but try to avoid spam and make sure it still respects the rules. For Comfy UI here, you usually
rules. For Comfy UI here, you usually have the most active chat. People talk
about Comfy UI, so if you post here for quick help, and if members know the answer and have time, you might get help. If not, it might get ignored.
help. If not, it might get ignored.
Another channel where you can post is the forum. There you usually post things
the forum. There you usually post things for longer term discussion. You might
post today and get replies in hours, days, or sometimes not at all if people do not know the answer. You can see that I can post in this channel because it lets me type. You can ask for help here
and include screenshots and all the details. This is also the channel where
details. This is also the channel where EVO posts updates about the Easy Installer, which he continuously improves and adds more scripts to make things easier. Thanks to EVO for all the
things easier. Thanks to EVO for all the help. Make sure you check this area for
help. Make sure you check this area for updates related to the easy installer.
The most visited channel is probably the Pixarroma workflows channel where people come to get my workflows from tutorials.
Here I have a list of older episodes from 2024 to 2025 and also this new series I am doing now. Starting with the first episode, you can see links that
lead to specific episodes. For example,
if I click on the first episode, I land on this page where I will also add a link to the YouTube video once it is ready. You will find all the chapters of
ready. You will find all the chapters of that video plus links to Comfy UI and the workflows. You can download the
the workflows. You can download the workflows either as a zip archive that you extract or as individual JSON files.
You can also comment on this forum post.
Since I post this series as forum posts, you can comment if something does not work so we can try to fix it if possible. Keep the conversation limited
possible. Keep the conversation limited to that specific episode. For the next episodes, comment on their respective posts. If it is not related, use the
posts. If it is not related, use the forum or the Comfy UI channel. For
off-topic discussions, use the off-topic channel. You can also use Discord to
channel. You can also use Discord to navigate quickly. You can create links
navigate quickly. You can create links to different channels. For example, if I type the hash sign, I can select different channels like Pixaroma Workflows. Or I can type hash and then
Workflows. Or I can type hash and then help. And you can see what happens. It
help. And you can see what happens. It
adds a link to that channel. When I
press enter, I get a clickable link to the help channel. If I click it, I land in the help channel. Let's go back to the off-topic channel. If you hover over
a message, you have different options like edit or add reactions. Some servers
also allow you to use emojis from other servers. For example, I can select this
servers. For example, I can select this Pixab Bunny emoji. To remove a message, hover again, click the three dots, and you have different options, including
delete. If I type hash and then
delete. If I type hash and then Pixarroma, and select Pixarroma workflows, press enter, then click it, I land in the Pixarroma workflows channel.
I keep getting messages that people cannot find the Pixarro Workflows channel. So, I hope this tutorial helps
channel. So, I hope this tutorial helps you find it more easily. In channels
where you cannot comment, you will see a message saying that posting is not allowed. Usually only admins or
allowed. Usually only admins or moderators can post there. Use the Comfy UI channel for discussion and help related to Comfy UI. Use Offtopic if you
cannot find the right channel. Let's go
to the forum. Here you can find all kinds of forum posts. For example, we have this pinned forum post that you should read before you post anything.
From here you can create a new post. You
can close this if you want to see more of the forum. When you create a post, you can add a title, a message, and screenshots with your workflow that has problems. You can also add tags to your
post depending on what the post is about. Add a clear title and a
about. Add a clear title and a descriptive message, not something vague. When you are done, you can post
vague. When you are done, you can post it using this button. You can also check other posts to see how they are written.
For example, the workflows from the first episode are posted in a forum post.
Besides that, you have more channels for AI video and AI music for chat GPT and other AI topics and a few more channels that I will let you explore in your free
time. Keep discussions civilized and
time. Keep discussions civilized and help when you can. I visit the Discord every day, but I cannot respond to all
messages. mention Pixarroma if something
messages. mention Pixarroma if something is important. In the top right, you also
is important. In the top right, you also have an inbox. On the left side, if you click on the logo, you have direct messages where you can talk with your
friends. If you click on unread, you can
friends. If you click on unread, you can see notifications, including mentions, so you can quickly see when someone mentioned you or everyone on the server.
You can jump directly to that message using the jump button. Always check
mentions, especially when you see your username. You also have a search bar
username. You also have a search bar which many people forget exists. Here
you can type a model name or a few words that people might have used in discussions. For example, if I search
discussions. For example, if I search for LTX2, I can see quite a few posts with that search query and I can jump to any of those discussions. You can also
search posts from a specific user. For
example, I can search for posts from Pixaroma. Make sure the username is
Pixaroma. Make sure the username is Pixarroma and not something else because some people try to mimic the name. Both
the username and display name should be Pixarro. Now you can see all the posts
Pixarro. Now you can see all the posts from Pixarro. You also have more options
from Pixarro. You also have more options and filters that you can use for different channels and searches.
You can use these arrows to reply to a message or forward it to someone else.
Okay, enough with Discord. Let's go back to Comfy UI. I included here more resources for Comfy UI like the official ones and also some unofficial ones like
Reddit or Facebook groups that you can try. Let's open the Reddit group for
try. Let's open the Reddit group for example. We have this comfy UI Reddit
example. We have this comfy UI Reddit group where you can see discussions, news, tutorials and so on and where you can post your questions. There is also
one called stable diffusion which includes discussions about stable diffusion and free models comfy UI but also other interfaces not only comfy UI.
You can also search for a word on Reddit like Comfy UI and sort the results by communities. Then you can check which
communities. Then you can check which ones have more members. The two I use the most are these ones. Make sure you also check the other notes I added here
like definitions for beginners, what a model is, what a text encoder is, and so on. There's also more information about
on. There's also more information about performance, common errors and fixes, model locations, custom nodes, and how to update Comfy UI. I also included a
link to the easy installer in case you want to go back to it and find more info or check what is new in the releases. If
you want, I also created an experimental chat GPT that you can try, especially for this easy install comfy UI version.
Like any chat GPT, it can hallucinate sometimes, but it is still better than a simple chat because it is more specialized for comfy UI. For example,
if I ask where the images are saved, it will think and also search the knowledge database where I added some files. Then
it will answer. You can see the answer is pretty good. So I think it will help a lot of beginners. Sometimes if you think it made a mistake, maybe because something is new and the model was
trained months ago, you can ask, "Are you sure?" Look online and it will
you sure?" Look online and it will search the web. This way you can double check and improve your chances of getting a more accurate response. In
this case, it knew that images are saved in the output folder. Let's ask
something else like where are the Pixarroma workflows? Where can I find
Pixarroma workflows? Where can I find them? It will tell you they are on
them? It will tell you they are on Discord and give you the channel name.
Let me try something else. Let's open
workflow number one, the juggernaut text to image workflow and disconnect this node to cause an error. When I run it, I get this error. Now I take a screenshot
of this error, go to that custom chat, paste the screenshot, and ask how to fix this error. You can see that in this
this error. You can see that in this case, since it was a simple error, it knew how to answer and told me to drag a wire from load checkpoint VAE to the VAE
decode node. This can save you a lot of
decode node. This can save you a lot of time in many cases. So, I hope you find it useful. You can give it more
it useful. You can give it more screenshots and more info, even ones without the error, so it can understand the workflow better. Sometimes on
GitHub, it asks you to post an error report, and you can find here a report that gives more info about the error.
You also have find this issue which opens the issue pages on the Comfy UI GitHub page. This is the official Comfy
GitHub page. This is the official Comfy UI GitHub page for the portable version, not the easy installer. Even though the easy installer installs the same version
plus extra scripts, there is an issues tab where people post problems. You can search issues that are open or include closed ones as well. You can also post a new issue if it is something new and you
did not find any information about it.
Make sure it is an issue with a comfy UI node, not a custom node. For custom
nodes, you need to go to the custom node page instead. To fix this error, we just
page instead. To fix this error, we just connect the VAE back to VAE decode. But
let's say you have your VAE as a separate file for some workflows. So you
use load VAE to load a VAE and connect that to VAE. Let's see what happens when I run the workflow. It gives this error which usually means we used models with different architectures that are not
meant to work together. The error is shown in VAE decode. But VAE decode is not really the problem. The problem is the input that goes into that node. In
this case, the VAE loaded with load VAE was the issue. Let's go back to the help workflow. I want to remind you that when
workflow. I want to remind you that when you ask for help, include screenshots of your workflow. Tell us what video card
your workflow. Tell us what video card you have, how much VRAM and system RAM you have, and which operating system you are using. Also, explain what you
are using. Also, explain what you already tried and what did not work.
This helps the community assist you faster. Okay, one more chapter to go.
faster. Okay, one more chapter to go.
Are you ready? You have now reached the end of this course. At this point, you understand how Comfy UI works, how workflows are built, how models differ, and how to use tools like Laura,
ControlNet, and advanced diffusion models. But learning Comfy UI does not
models. But learning Comfy UI does not really end here. This is just the foundation. The most important thing to
foundation. The most important thing to understand is that Comfy UI is not a fixed tool. It is constantly evolving.
fixed tool. It is constantly evolving.
New models appear, new nodes are created, new workflows solve problems in better ways. So the best way to continue
better ways. So the best way to continue learning is by experimenting. Open
workflows, break them, rebuild them, change one thing at a time and see what happens. That is how real understanding
happens. That is how real understanding happens. Another important habit is
happens. Another important habit is reading workflows, not just using them.
When you download a workflow, do not just press run. Look at the nodes.
Follow the connections. Ask yourself why something is there. If a workflow looks confusing, that usually means it is teaching you something new. Next, stay
connected to the community. Use Discord
to ask questions, share results, and help others when you can. Very often,
answering someone else's question will make you understand things better yourself. Follow model releases, but do
yourself. Follow model releases, but do not chase everything. You do not need every new model. Find a few that work well for your style and hardware and learn them deeply. As you get more
comfortable, start building your own workflows from scratch, even simple ones, especially simple ones. That is
how you move from copying to creating.
Also, remember that AI tools change fast. What matters most is not
fast. What matters most is not memorizing settings, but understanding concepts. Noise, conditioning, sampling,
concepts. Noise, conditioning, sampling, structure versus style. Those ideas will stay useful even when models change.
Finally, do not rush. There is no finish line here. Learning Comfy UI is a
line here. Learning Comfy UI is a process, not a goal. Take your time, have fun, and keep experimenting. This
is just the beginning. So, what comes next? Obviously, I will continue this
next? Obviously, I will continue this series and do episode 2, 3, and so on.
But I cannot make them as big as this first episode. They will be shorter
first episode. They will be shorter videos focused on things we did not learn yet like other models such as Quen or Flux video models and so on. We still
have a lot to cover and every week or month we see new models and new nodes appearing. My plan for the new series is
appearing. My plan for the new series is to show you these new models and workflows in a more easy to understand way so everything makes sense as much as possible. Some of these workflows and
possible. Some of these workflows and models are so new that nobody really knows much about them yet. I will try to post a new episode every week if my
health allows it. If not, then at least one episode every 2 weeks. This new
series will have bunnies on the thumbnails so you do not confuse it with the old series.
For the new series, as you saw, the workflows on Discord are posted in the forum. This makes it easier for me to
forum. This makes it easier for me to see when you find a bug or when something does not work anymore so I can try to fix it. That is why the old series, even though it still has good
tutorials and you can still watch it, especially the last episodes, will not receive updated workflows. I will not go back and try to fix those old workflows.
Instead, I will focus on the new series.
When needed, I can revisit those older workflows, adapt them to new models, and present that in a new episode in the new series. I wanted you to have the basics
series. I wanted you to have the basics in this long episode 1, this course that you probably cannot find somewhere else.
I worked one month on this episode, and I wanted everyone to have access to it for free. I could have put it behind a
for free. I could have put it behind a paid course, but I feel better when I can help people. That being said, I do appreciate your support. There are many ways you can help me and this channel so
I can create more videos. The easiest
way is to press the like button, subscribe to the channel, and leave a comment, even if it is just a simple thank you. This shows activity to the
thank you. This shows activity to the YouTube algorithm and helps the video reach more people. So now, if someone asks where they can start learning Comfy UI, you can share the link to this
course. I will also create a new
course. I will also create a new playlist that will host all the new episodes from this series. For those who can afford to buy me a cup of tea, since I do not drink coffee, being a bunny, I
already have too much energy. You can
use the join button. Here you have four different options from really cheap, like half a cup of tea per month, to more expensive, like a premium cup of tea. Depending on the option you choose,
tea. Depending on the option you choose, you get different perks. For example,
Legends have a private channel on Discord where they get to know me better. If you do not want to help
better. If you do not want to help monthly, you can also help one time. On
each video, you can find this heart with a dollar sign called super thanks. Super
thanks allows you to select an amount of money that you want to donate and send it. You can use this for videos that
it. You can use this for videos that really helped you like this course or any other episode where you learned something useful. Speaking about legends
something useful. Speaking about legends and those who subscribed to the membership, I want to thank all of you who made this course possible with your support. Together, we can help other
support. Together, we can help other people learn new tools, understand this crazy AI world we live in, and maybe even make it a better place. Thank you
all. You are the best. Have a great day, and I will see you on Discord.
Loading video analysis...