How to train Flux LoRA models

Updated Categorized as Tutorial Tagged , , 41 Comments on How to train Flux LoRA models

The Flux Dev AI model is a great leap forward in local Diffusion models. It delivers quality surpassing Stable Diffusion 1.5 and XL models. Like Stable Diffusion models, you can train LoRA models on top of Flux to inject custom characters and styles.

In this post, I will provide a Google Colab notebook for training Flux LoRA models. With some setup, you can also run the training workflow locally if you have a good GPU card.

Requirement

You must be a site member to access the Colab Training notebook and training images below.

Become a member of this site to see this content

Already a member? Log in here.

You can download the ComfyUI training workflow for free to train locally.

Software

This is an advanced tutorial for training your own AI model. If you are new to Stable Diffusion or Flux AI models, get the Quick Start Guide for a newbie setup guide.

This training workflow uses ComfyUI as GUI. It uses the ComfyUI Flux Trainer custom node with modifications for ease of use, which calls the tried-and-true Kohya LoRA Trainer under the hood.

The training workflow consists of two parts:

  1. Generate captions of the training images automatically.
  2. Apply those images to Flux.1 dev model to train a custom LoRA model.

Alternatives

The Flux1.dev model requires more memory to run. Alternatively, you can train:

Training on Google Colab

What you need before you start

Google Colab Pro Plan

Due to the demanding computational resources required to train a Flux LoRA model, this notebook requires a paid Google Colab Pro plan. I use the Colab Pro Plan, but the Pro and Pro+ plans would work.

It typically takes ~4.5 hours to train a LoRA on an L4 instance. As of September 2024, the Colab Plus plan costs $10 a month, and you can use an L4 for about 33 hours. So, training a LoRA on Colab will set you back ~$1.40.

Training images

To train a Flux LoRA model, you need a set of training images. 10-20 images should do the trick for training a face.

ONLY PNG images are supported.

If you are a member of this site, you can download the example training images below.

Become a member of this site to see this content

Already a member? Log in here.

They are all cropped to 1024×1024, but the training notebook supports different sizes.

Tips for good training images:

  • The ideal size is 1024×1024. It is OK to use images with different sizes but be sure to include some close to 1024×1024.
  • Diversity is the key. You want your subject to be in different scenes, settings, and clothing. Otherwise, the model will be confused about what you are trying to train.
  • If you are training a face, include a few high-resolution headshots.
  • The default parameters work for 10 – 20 images.

Step 1: Upload images to Google Drive

Put your training images in the folder AI_PICS > Flux_trainer_input in your Google Drive. (The folder names are case-sensitive.)

Note: Only PNG images are supported.

Step 2: Run the training notebook

Open the Google Colab Notebook below. You need to be a site member to access it

Become a member of this site to see this content

Already a member? Log in here.

Click the Run button to start running the notebook.

You will be asked to grant permission to access your Google Drive. Grant permission, as it is necessary to load the training images and save the LoRA model.

It will take a few minutes to load. When it is done, you should see a URL and a Tunnel Password like the ones below.

Visit the URL and enter the Tunnel Password to access ComfyUI.

Step 3: Load the workflow

Download the Easy Flux Trainer workflow below and drop it on the ComfyUI browser page to load it.

You should see the workflow loaded like the screenshot below.

Step 4: Review input parameters

Review the input parameters in the Inputs group.

  • LoRA name: The name of your LoRA. Pick a name that matches what you are training for.
  • Token: The trigger keyword of your LoRA. Put this in the prompt when using your LoRA.
  • Image Input Path: The folder path of your training images in Google Drive. You don’t need to change this unless you have put the images in another folder.
  • LoRA Output Path: The folder where the LoRA model will be stored.
  • Test prompts: The prompts that will be used to test the LoRA model during training. The prompts are separated by a vertical line (|). Include the token in the prompt if you want to test the training. Include a prompt to test not using the token to monitor over-training.

Step 5: Generate captions

The workflow can generate captions automatically for your training images using the BLIP Vision-language model.

In the Step Selector Input node, set

  • Enable Captioning: Yes
  • Enable Training: No

Click Queue Prompt to start captioning.

After it is done, you will see text files with the same name of the training images created in the image input folder.

You can optionally revise them to better match the images.

Step 6: Start training

Now, we get to the fun part of training.

In the Step Selector input node, set:

  • Enable Captioning: No
  • Enable Training: Yes

The workflow is configured to test and save the LoRA with the prompts you specified every 400 steps. You should see 4 blocks of Flux Train Loop nodes like the one shown below.

You can change the steps values to change intervals between saving/testing the LoRA.

Click Queue Prompt to run the workflow.

Running on an L4 GPU instance (default) takes ~1.5 hours per 400 steps. If you can visualize the training results in ComfyUI like below.

If you don’t see the images, you can find them in the samples folder inside the LoRA output folder. The default output folder is AI_PICS > Flux_trainer_output in your Google Drive.

It typically takes 1,000-1,500 steps to train a LoRA for Flux.1 Dev.

Feel free to stop the workflow early if you have achieved the results already.

Step 7: Test the LoRA

Follow the tutorial How to use LoRA with Flux AI model to test your LoRA.

In addition to adding the LoRA, remember to add the token (trigger keyword) to the prompt. (“emma” in this tutorial)

<lora:flux_emma_rank16_bf16:1> emma skiing outfit, snow, smile, brown hair, light, safety goggle

Use Ngrok for faster connection

If you experience a slow connection with Local Tunnel, you can try using ngrok instead of Local Tunnel to establish a public connection. It is a more stable alternative.

You will need to set up a free account and get an authoken.

Copy the authoken from https://dashboard.ngrok.com/get-started/your-authtoken and paste it into the NGROK field in the notebook.

Go to https://ngrok.com/

Create an account

Verify email

Training locally on Windows/Linux

You can use this workflow locally, although I won’t support it. You can download the workflow JSON file above.

Software setup

It is better to install a fresh copy of ComfyUI just for running this training because the workflow has a low tolerance for conflicts.

Use Pytorch 2.4.

Install ComfyUI Manager.

Drop the workflow to the ComfyUI browser page and install the missing custom nodes using the ComfyUI Manager.

I am not able to keep up with the breaking changes in the custom nodes. So, the workflow only works for particular versions of ComfyUI and custom nodes.

Use git checkout to check out the following commits accordingly.

  • ComfyUI: 9c5fca75f46f7b9f18c07385925f151a7629a94f
  • ComfyUI-Manager: ce874d5c624d5713e7db334d1e0c50aeddb90d82
  • was-node-suite-comfyui: bb34bd429ab74a22a7f58551429a0f3046e1464e
  • rgthree-comfy: cae8e2ad28ddb933a916b852d26b95726f60529f
  • ComfyUI-KJNodes: 7aa591b3a64a3f83ec2c3e92758d0bb0926a6fe0
  • ComfyUI-FluxTrainer: c3aa4ea889153519f7be40636b44d5a03a060816
  • Image-Captioning-in-ComfyUI: 9b24deea8eef830da059aa91cac9690ecde19fda

For example, run the following under the ComfyUI folder.

git checkout 9c5fca75f46f7b9f18c07385925f151a7629a94f

And the following under the custom_nodes\ComfyUI-FluxTrainer folder.

git checkout c3aa4ea889153519f7be40636b44d5a03a060816

Follow the Flux tutorial for ComfyUI to download the Flux.1 Dev UNet model, text encoders, and VAE.

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

41 comments

  1. Hi again,

    Trying to use it via collab, got the following:

    FluxTrainLoop
    Allocation on device

    and on logs, got these lines with errors:
    2024-10-26 02:43:11,402 – root – INFO – got prompt
    2024-10-26 02:43:11,835 – root – ERROR – Failed to validate prompt for output 73:
    2024-10-26 02:43:11,835 – root – ERROR – * UploadToHuggingFace 89:
    2024-10-26 02:43:11,835 – root – ERROR – – Required input is missing: network_trainer
    2024-10-26 02:43:11,835 – root – ERROR – Output will be ignored
    “`
    ## Attached Workflow
    Please make sure that workflow does not contain any sensitive information such as API keys or passwords.
    “`
    Workflow too large. Please manually upload the workflow from local file system.
    “`

    ## Additional Context
    (Please add any additional context or steps to reproduce the error here)

    any ideas? thank you!

      1. Just tried again and seemed to get the same error…

        # ComfyUI Error Report
        ## Error Details
        – **Node Type:** FluxTrainLoop
        – **Exception Type:** torch.OutOfMemoryError
        – **Exception Message:** Allocation on device
        ## Stack Trace

        (…)

        2024-10-27 10:04:24,690 – root – INFO – got prompt
        2024-10-27 10:04:24,844 – root – ERROR – Failed to validate prompt for output 73:
        2024-10-27 10:04:24,844 – root – ERROR – * UploadToHuggingFace 89:
        2024-10-27 10:04:24,844 – root – ERROR – – Required input is missing: network_trainer
        2024-10-27 10:04:24,844 – root – ERROR – Output will be ignored
        “`
        ## Attached Workflow
        Please make sure that workflow does not contain any sensitive information such as API keys or passwords.
        “`
        Workflow too large. Please manually upload the workflow from local file system.

          1. Hello I get the same error with the updated notebook and Colab Pro and L4 instance with high RAM. The error in the colab: “Got an OOM, unloading all loaded models.”
            The error in ComfyUI:
            # ComfyUI Error Report
            ## Error Details
            – **Node Type:** FluxTrainLoop
            – **Exception Type:** torch.OutOfMemoryError
            – **Exception Message:** Allocation on device
            ## Stack Trace
            “`

          2. Mmm.. This error is fixed on the Oct 26th update. I just ran it successfully. If you still see the error, send the notebook url to my email and the whole log.

          3. Yes, but tried again and apparently it worked overnight. From what I could tell it completed 3 out of the 4x 400 steps and comfyui lost connection, saved in the output 3 safetensors files each named step00400, step00800, step01200.

            I’ll try again now but just to know what to expect, if 100% successful should I get a single safetensors output or one for each step (4 in total)?

            thanks!

          4. Yes you should get the 1600 step model saved if the run is complete. It is usually not necessary to train that long. You can test the 800 and 1200 model to see if they are good enough.

  2. Hi there,
    Trying to run it locally, feel free to ignore if you prefer not to support this as you mentioned.
    Got the captioning done successfully, when running the training I’m getting the following error:

    Prompt outputs failed validation
    UploadToHuggingFace:
    – Required input is missing: network_trainer
    InitFluxLoRATraining:
    – Failed to convert an input value to a FLOAT value: T5_lr, enabled, could not convert string to float: ‘enabled’

  3. Hi,
    the colab lora trainer works fine, thank you! I noticed that a message is displayed at the start of the training like “no text encoder used” and it seems to me that improving the captions does not improve the loss score…should I download a text encoder to the folder AI Pics/models/text encoder? If yes, is this text encoder used automatically or should I change some settings in the lora trainer? Best regards, Gerald

    1. I am not sure if training the text encoder is supported. You normally don’t need to train the text encoder since flux uses pretrained ones. Fine-tuning the UNet like in this notebook should be enough.

      1. Hi, ok thanks, I was afraid that the message meant that the Lora would be trained completely without captions! Because I had an extra message afterwards that the text encoder is not trained either, but maybe both messages referred to that. Best regards, Gerald

  4. When trying to caption, I get this error
    LoRA Caption Load
    local variable ‘image1’ referenced before assignment
    Please help

  5. Hi thanks so much for posting this tutorial.
    I have a probleme, the preview image is tottaly different from the reference, i train for a woman, and the preview show me some men with differnt style.
    This my second lora, first time i was not have this error.

    1. The lora is not training. Make sure you start with the default setting in the workflow. I found changing the settings have a big effect. Using similar number of images and diversity in the training example should work.

  6. I have also a question about an error message, maybe you can comment on that if you like.
    I had multiple tries where I could go on beacuse of theis Error message: # ComfyUI Error Report
    ## Error Details
    – **Node Type:** InitFluxLoRATraining
    – **Exception Type:** AttributeError
    – **Exception Message:** module ‘torch’ has no attribute ‘float8_e4m3fnuz’

    I managed to start my first run, I am not exactly sure what I have done to avoid this error but it worked. Unfortunately this run was interrupted while I was away. Trying to start again, got this error message again. I tried toggeling fp8_base in the Init node but it did not make difference. After googleing I found some comments that the torch version might not be the correct one. However after installing
    pip install –upgrade transformers
    pip install –upgrade safetensors
    pip install –upgrade torch torchvision torchaudio
    it worked again. But I have no clue what actually the problem was. You?

    1. It may be caused by the order of installation when you try the notebook the 2nd time. Before running the notebook again, you can disconnect the runtime so that it starts fresh.

      1. No I don’t think so. I started the notebook many times from scratch and always landed there. I even switched to an A100 to test it and burned through half my credits. Only after pip install –upgrade transformers
        pip install –upgrade safetensors
        pip install –upgrade torch torchvision torchaudio
        it seemed to work.
        Do you have any explanation for that?

        1. Mmm.. Not so sure. The notebook works with a L4 instance with high RAM. As long as you have Colab Pro, you can use high RAM and launch the exact hardware config.

  7. Workbook is throwing a “torch.OutOfMemoryError 4 Flux TrainLoop” error (running it as per directions and not locally, just FYI).

      1. I have pay as you go, not Collab Pro, and yes using an L4 – the descriptions say that there is no memory difference between the GPU machines…

          1. Works on the Collab Pro plan, must be the CPU RAM. May want to note that as well as that it requires PNG files for your images. Great work!

  8. Thank you for this tutorial.
    I have a Collab Pro account; when training, the comfyUI workflow keeps crashing with this error:

    FluxTrainLoop
    Allocation on device

    Might you know what the issue is?

  9. I am getting the following error message on both Google Colab and locally. Any thoughts on what might be wrong?

    LoRA Caption Load
    cannot access local variable ‘image1’ where it is not associated with a value

    Also the check arguments node id red and does not have any text where it looks like from your screenshot there is some text in there.

  10. Hi thanks so much for posting this tutorial. It works great, except the colab notebook keeps timing out due to inactivity in the middle of the training. It’s very frustrating. Do you by any chance have a workaround? Or a way to continue training on the 800th step?

    1. I have the same experience. I went away coming back and I thought it crashed. Then started the 2nd run, annoyingly it does not continue at where it stopped although I turned on save state. I had around 800+ its at this point. Then suddenly a captcha poped up asking “still there?”. So I was able to confirm this time and it went on. At the moment it is stilll running. But I can’t sit next to the computer for 6 hours just waiting for captchas. There has to be a way to continue from the saved states. I used to train loras with ai-toolkit and train_lora_flux_24gb.yaml and it would resume if you stopped the process. I don’t see a future for this project as it is at the moment. Besides running the colab is do damn slow that it’s really frustrating to use. I have no paid plan but used Pay As You Go to buy credits.

      1. Yeah this was my experience as well. It workwd though when I subscribed to the Colab Pro plan. It seems to only cause problems with the Pay as you go plan.

  11. Great tutorial, thanks!
    When training faces you said to include high-definition headshots but a few lines before you wrote the ideal training image size is 1024×1024. Do you mean for headshots use 1024×1024 or can I use larger sizes? From what I understand during the training process it will resize to 512 or 1024 (defined in settings – I used fluxgym locally but couldn’t run it in collab)

Leave a comment

Your email address will not be published. Required fields are marked *