How to train Lora models

Updated Categorized as Tutorial Tagged 130 Comments on How to train Lora models
Lora training Andy Lau

A killer application of Stable Diffusion is training your own model. Being an open-source software, the community has developed easy-to-use tools for that.

Training LoRA models is a smart alternative to checkpoint models. Although it is less powerful than whole-model training methods like Dreambooth or finetuning, LoRA models have the benefit of being small. You can store many of them without filling up your local storage.

Why train your own model? You may have an art style you want to put in Stable Diffusion. Or you want to generate a consistent face in multiple images. Or it’s just fun to learn something new!

In this post, you will learn how to train your own LoRA models using a Google Colab notebook. So, you don’t need to own a GPU to do it.

This tutorial is for training a LoRA for Stable Diffusion v1.5 models. See training instructions for SDXL LoRA models.

Train a Stable Diffuson v1.5 LoRA

Software

The training notebook has recently been updated to be easier to use. If you use the legacy notebook, the instructions are here.

You will use a Google Colab notebook to train the Stable Diffusion v1.5 LoRA model. No GPU hardware is required from you.

If you are a member of this site, access the training notebook and images below.

Become a member of this site to see this content

Already a member? Log in here.

Alternatively, you can purchase the notebook using the link below.

You will need Stable Diffusion software to use the LoRA model. I recommend using AUTOMATIC1111 Stable Diffusion WebUI.

Get the Quick Start Guide to learn how to use Stable Diffusion.

Step 1: Collect training images

The first step is to collect training images.

Let’s pay tribute to Andy Lau, one of the four Heavenly Kings of Cantopop in Hong Kong, and immortalize him in a Lora…

Andy Lau getting ready for Lora.
Andy Lau, one of the four Heavenly Kings, is getting ready for a Lora training.

Google Image Search is a good way to collect images.

Searching training images in Google Image search.
Use Image Search to collect training images.

Let’s collect at least 15 training images.

Pick images that are at least 512×512 pixels for v1 models.

Make sure the images are either PNG or JPEG formats.

I collected 16 images for training. You can download them to follow this tutorial on the training notebook’s product page OR below if you are a member (login required).

Become a member of this site to see this content

Already a member? Log in here.

Step 2: Review the training settings

Open the Easy LoRA Trainer SD 1.5 notebook.

Here is some input to review before running the cell.

Project folder

A folder in Google Drive containing all training images and captions. Use a folder name that doesn’t exist yet.

Pretrained model name

The Hugging Face name of the checkpoint model. Here are a few options.

The Stable Diffusion v1.5 model is the official v1 model. Note: The official repository on runwayml is deprecated. Use the following.

stable-diffusion-v1-5/stable-diffusion-v1-5

Realistic Vision v2 is good for training photo-style images.

SG161222/Realistic_Vision_V2.0

Anything v3 is good for training anime-style images.

admruul/anything-v3.0

You can find other models on Hugging Face using this link or this link. They have to be diffusers models. You can tell by a similar folder structure to the models above.

Image repeats

How many times the training images are repeated in each training epoch.

Number of epochs

The number of training rounds. Increase this number to increase training cycles.

Learning rate

How big a step is it to update the model.

Trigger keyword

The token associated with your subject.

Lora name

The name of your LoRA file. In AUTOMATIC1111, It looks like <lora:AndyLau001:1> when you use the LoRA.

Lora output path

A folder in Google Drive where the Lora file will be saved. The default path lets you use the LoRA with our AUTOMATIC1111 Google Colab notebook without moving.

Step 3: Run the notebook

Run the notebook by clicking the Play button on the left.

It will ask you to connect to your Google Drive.

You must accept because there’s no easy way to download the final LoRA model from Google Colab.

Click Choose Files and select your training images.  (The images, not the zip file.)

It may prompt for restarting the runtime. Click Cancel.

It will take a while to complete running. It will

  • Set up the training software
  • Generate captions for your images. You can find them in the project folder in your Google Drive. They are the .txt files with the same name as your images.
  • Train the LoRA model.

The LoRA model is saved in your Google Drive folder: AI_PICS > Lora in the default setting.

Note: Google Colab may not work well with browsers other than Chrome. Try using Chrome if you experience an issue uploading the image.

When you are done, don’t forget to click the caret on the top right, and click disconnect and delete the runtime. Otherwise it will keep consuming your compute credit.

Using the LoRA

If you save the LoRA in the default output location (AI_PICS/Lora), you can easily use the Stable Diffusion Colab Notebook to load it.

Open AUTOMATIC1111 Stable Diffusion WebUI in Google Colab. Click the Select the Lora tab and click the LoRA you just created.

Here are the prompt and the negative prompt:

Andy Lau in a suit, full body <lora:AndyLau001:1>

ugly, deformed, nsfw, disfigured

Since we have used “Andy Lau” as the triggering keyword, you will need it in the prompt for it to take effect.

Although the LoRA is trained on the Stable Diffusion v1.5 model, it works equally well with the Realistic Vision v2 model.

Here are the results of the Andy Lau LoRA.

Revise captions

The captions are generated automatically during training. But they can be incorrect sometimes. You may want to revise them and retrain your model.

To do so, go to your Google Drive and navigate to the project folder. (AI_PICS > training > AndyLau with the default setting).

You should see a sub-folder that contains all your images and captions. They are the files ending with .txt and with the same names as the images.

Revise them to fit the image.

In the notebook, check Skip_image_upload and run the notebook to start a new training. You can optionally change the Lora_name to avoid overwriting the previous one.

You don’t need to disconnect and reconnect the notebook for new training.

General tips

Experimentation

Training a LoRA model requires patience and experimentation. You should treat the default parameters as a starting point.

Observe the result and change the settings one at a time. Observe the result with the same seeds. Generate multiple images to draw conclusion.

Just like a good old scientist would do.

A systematic approach may take longer. But in the end, the knowledge and intuition you gain will make you a better trainer.

Overcook and undercook

We say a LoRA is overcooked when your training subject shows color saturation like using a high CFG value.

On the other end, a LoRA is undercooked when your subject is not showing enough.

The images below used a LoRA that’s undercooked. This guy doesn’t look quite like Andy Lau. (50 repeats)

The LoRA is trained just right for the following images. (100 repeats)

The following images are overcooked (150 image repeats). See the backgrounds are all the same.

You can increase the image repeats or number of epochs to cook more. Likewise, reduce them to cook less.

There’s probably a setting between 100 and 150 that gives the optimal result.

If the training is too aggression that it is easy to overcook, you can lower the learning rate. This will cause a smaller update to be made to the model. You may need to increase the image repeats or the number of epochs to compensate.

Reference

LoRA training parameters – An authoritative reference of training parameters.

LEARN TO MAKE LoRA – A graphical guide to training LoRA.

kohya_ss Documentation – English translation of kohya_ss manual.

FAQ

I got ModuleNotFoundError: No module named ‘torch._custom_ops’

You likely have restarted the runtime when prompted. Click cancel when prompted to restart the runtime.

Andrew

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

130 comments

  1. Hey Andy!

    I haven’t used the notebook for a few weeks but have been trying to train a LORA over the past day or so and I keep encountering this error message. Any ideas?

    File “/content/venv/lib/python3.10/site-packages/accelerate/commands/launch.py”, line 703, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command ‘[‘/content/venv/bin/python3’, ‘./train_network.py’, ‘–enable_bucket’, ‘–min_bucket_reso=256’, ‘–max_bucket_reso=2048’, ‘–pretrained_model_name_or_path=stable-diffusion-v1-5/stable-diffusion-v1-5’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/Lexi001’, ‘–resolution=512,650’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–network_alpha=64’, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=5e-05’, ‘–unet_lr=0.0001’, ‘–network_dim=64’, ‘–output_name=Lexi001’, ‘–lr_scheduler_num_cycles=1’, ‘–no_half_vae’, ‘–learning_rate=0.0001’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=100000’, ‘–save_every_n_epochs=99999’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–seed=1234’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–optimizer_type=AdamW’, ‘–max_data_loader_n_workers=1’, ‘–clip_skip=2’, ‘–bucket_reso_steps=64’, ‘–max_train_epochs=1’, ‘–mem_eff_attn’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.05′]’ returned non-zero exit status 1.

    I haven’t changed any of the default settings at all.

      1. It’s working at the moment, going through the steps of training the LORA which is further than it was getting with mine.

        Does that suggest a problem with the pictures I was using? They were all either JPG, JPEG or PNG extensions…

        1. Yes, it is likely from your input images. You can try making them as close to the sample images as possible, like filenames (no special characters), file extension, and sizes.

  2. Hi, thank you for all this information. What I still don*t understand, is how I train and finetune (a Lora or full model in dreambooth) in just one detail of a picture, e,g, a hand or a natural flaccid penis for fine art photorealistic images, without changing the whole appearence (and charm) of a model?
    I like to work with SDXL or Juggernaut XL, as the results are most natural in my opinion. SD1.5 usually lacks details because of the small size when its about depicting whole (posing) bodies. Although SDXL and Juggernaut are much better with fingers and hand poses, e.g. male genitals are very seldom generated anatomically correct, at least in text2img. The only workaround to correct it in an existing generated image is using 0001SoftRealistic v187 or RealDream 12 models in img2img with low strength around 40 (to not change the emotion and face in the image too much.
    There are Loras on CIVITAI for each pose or body detail (faces, butts, even male genitals,…), but nearly all change the faces in an unnatural way. Is there a way to train a Lora or finetune a model with dreambooth just for a certain body-detail like a hand or is the only way to mask and inpaint the picture? And if so, would you select just pictures of a hand (without the body) or diverse whole portraits with interesting hand-poses, otherwise SDXL would just generate a hand without the body?

  3. Hi there, I ran into this error today, It was working well a month ago when I last tried though:

    Traceback (most recent call last):
    File “/content/kohya_ss/./train_network.py”, line 990, in
    trainer.train(args)
    File “/content/kohya_ss/./train_network.py”, line 178, in train
    train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group)
    File “/content/kohya_ss/library/config_util.py”, line 494, in generate_dataset_group_by_blueprint
    dataset.make_buckets()
    File “/content/kohya_ss/library/train_util.py”, line 738, in make_buckets
    info.image_size = self.get_image_size(info.absolute_path)
    File “/content/kohya_ss/library/train_util.py”, line 971, in get_image_size
    image = Image.open(image_path)
    File “/content/venv/lib/python3.10/site-packages/PIL/Image.py”, line 3498, in open
    raise UnidentifiedImageError(msg)
    PIL.UnidentifiedImageError: cannot identify image file ‘/content/drive/MyDrive/AI_PICS/training/GudiaV1/100_GudiaV1/IMG_4848.jpeg’
    Traceback (most recent call last):
    File “/content/venv/bin/accelerate”, line 8, in
    sys.exit(main())
    File “/content/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py”, line 48, in main
    args.func(args)
    File “/content/venv/lib/python3.10/site-packages/accelerate/commands/launch.py”, line 1097, in launch_command
    simple_launcher(args)
    File “/content/venv/lib/python3.10/site-packages/accelerate/commands/launch.py”, line 703, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command ‘[‘/content/venv/bin/python3’, ‘./train_network.py’, ‘–enable_bucket’, ‘–min_bucket_reso=256’, ‘–max_bucket_reso=2048’, ‘–pretrained_model_name_or_path=stable-diffusion-v1-5/stable-diffusion-v1-5’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/GudiaV1’, ‘–resolution=512,650’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–network_alpha=64’, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=5e-05’, ‘–unet_lr=0.0001’, ‘–network_dim=64’, ‘–output_name=GudiaV1’, ‘–lr_scheduler_num_cycles=1’, ‘–no_half_vae’, ‘–learning_rate=0.0001’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=100000’, ‘–save_every_n_epochs=99999’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–seed=1234’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–optimizer_type=AdamW’, ‘–max_data_loader_n_workers=1’, ‘–clip_skip=2’, ‘–bucket_reso_steps=64’, ‘–max_train_epochs=2’, ‘–mem_eff_attn’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.05′]’ returned non-zero exit status 1.

    1. Hi, I just ran it and it is working correctly. Yo can try following the original link to the notebook and test with the training image I supplied to ensure everything is working on your end.

  4. Hello, thanks for your services. I’ve ran into these errors today, where it had been working yesterday.

    “404 Not Found [IP: 185.125.190.82 80]
    Fetched 2,467 kB in 2s (1,114 kB/s)
    E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/universe/p/python3.10/python3.10-venv_3.10.12-1%7e22.04.3_amd64.deb 404 Not Found [IP: 185.125.190.82 80]
    E: Unable to fetch some archives, maybe run apt-get update or try with –fix-missing?
    /content
    The virtual environment was not created successfully because ensurepip is not
    available. On Debian/Ubuntu systems, you need to install the python3-venv
    package using the following command.

    apt install python3.10-venv

    You may need to use sudo with that command. After installing the python3-venv
    package, recreate your virtual environment.

    Failing command: /content/venv/bin/python3″

    And also:

    “/content/kohya_ss
    env: PYTHONPATH=/env/python:/content/kohya_ss
    env: PYTHONPATH=/env/python:/content/kohya_ss
    Traceback (most recent call last):
    File “/content/kohya_ss/finetune/make_captions.py”, line 9, in
    from PIL import Image
    ModuleNotFoundError: No module named ‘PIL’
    /bin/bash: line 1: /content/venv/bin/pip: No such file or directory
    /bin/bash: line 1: /content/venv/bin/accelerate: No such file or directory”

    1. Hi, I have put in a fix for this particular issue. I will test the whole network tonight but feel free to run it to see if that works for you.

  5. just started getting the following error.
    ImportError: cannot import name ‘split_torch_state_dict_into_shards’ from ‘huggingface_hub’ (/usr/local/lib/python3.10/dist-packages/huggingface_hub/__init__.py)

  6. I got an error trying to use Linaqruf/anything-v3.0 to train, but it worked for runwayml/stable-diffusion-v1-5.
    Can you check?

  7. Oh my goodness, a colab journal that FINALLY works! (I bought it on gumroad btw) the models trained by the hollowstrawberry one produces nonsense results currently, but the models trained with yours actually produce GREAT results. Thanks!!

  8. Hello,
    I keep getting this error when training using SDXL notebook even though it was working just fine before,
    WARNING: The following packages were previously imported in this runtime:
    [pydevd_plugins]
    You must restart the runtime in order to use newly installed versions.

    And then the code stopped at:

    File “/usr/local/lib/python3.10/dist-packages/torchvision/_meta_registrations.py”, line 4, in
    import torch._custom_ops
    ModuleNotFoundError: No module named ‘torch._custom_ops’

    I’ve tried to load new notebook in case you updated it but still the same.

      1. It should be fixed. please confirm you see the updated version on 3/15 for the SDXL notebook. If you still see error, post the error message.

      1. I just bought this and same error:
        ModuleNotFoundError: No module named ‘torch._custom_ops’

  9. Pretrained_Model_Name ….runwayml/stable-diffusion-v1-5

    What is this path. Am I supposed to change the Path to where checkpoint models are in my Google drive ? I do not have .runwayml/stable-diffusion-v1-5

      1. Maybe my question was not clear.
        1) What is “runwayml” in the Path?
        2) Is this where model stable-diffusion-v1-5 resides in your google drive?

        I changed the path to where my models are, still not able to get Lora Model show up after the colab finishes running

        1. runwayml is a Hugging Face user name. It is the company Runway ML.

          The SD 1.5 model is not in your google drive. It is on Hugging Face’s website. The path “runwayml/stable-diffusion-v1-5” refers to the location on Hugging Face.

          You need to use, for example, A1111 to load the LoRA.

          I just added the second cell to generate images with your LoRA.

      2. Also, the notebook runs and stops with this error message
        “ModuleNotFoundError: No module named ‘torch._custom_ops'”

        More Detail =================
        You are in ‘detached HEAD’ state. You can look around, make experimental
        changes and commit them, and you can discard any commits you make in this
        state without impacting any branches by switching back to a branch.

        If you want to create a new branch to retain commits you create, you may
        do so (now or later) by using -c with the switch command. Example:

        git switch -c

        Or undo this operation with:

        git switch –

        Turn off this advice by setting config variable advice.detachedHead to false

        HEAD is now at ed4e3b0 Merge pull request #1476 from bmaltais/dev2
        env: PYTHONPATH=/env/python:/content/kohya_ss
        Traceback (most recent call last):
        File “/content/kohya_ss/finetune/make_captions.py”, line 13, in
        from torchvision import transforms
        File “/usr/local/lib/python3.10/dist-packages/torchvision/__init__.py”, line 6, in
        from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
        File “/usr/local/lib/python3.10/dist-packages/torchvision/_meta_registrations.py”, line 4, in
        import torch._custom_ops
        ModuleNotFoundError: No module named ‘torch._custom_ops’

        1. same problem, can’t train a 1.5 LoRa because of the previous error.

          ModuleNotFoundError: No module named ‘torch._custom_ops’
          Traceback (most recent call last):
          File “/usr/local/bin/accelerate”, line 5, in
          from accelerate.commands.accelerate_cli import main
          File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py”, line 19, in
          from accelerate.commands.estimate import estimate_command_parser
          File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/estimate.py”, line 34, in
          import timm
          File “/usr/local/lib/python3.10/dist-packages/timm/__init__.py”, line 2, in
          from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \
          File “/usr/local/lib/python3.10/dist-packages/timm/models/__init__.py”, line 1, in
          from .beit import *
          File “/usr/local/lib/python3.10/dist-packages/timm/models/beit.py”, line 49, in
          from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
          File “/usr/local/lib/python3.10/dist-packages/timm/data/__init__.py”, line 5, in
          from .dataset import ImageDataset, IterableImageDataset, AugMixDataset
          File “/usr/local/lib/python3.10/dist-packages/timm/data/dataset.py”, line 12, in
          from .parsers import create_parser
          File “/usr/local/lib/python3.10/dist-packages/timm/data/parsers/__init__.py”, line 1, in
          from .parser_factory import create_parser
          File “/usr/local/lib/python3.10/dist-packages/timm/data/parsers/parser_factory.py”, line 3, in
          from .parser_image_folder import ParserImageFolder
          File “/usr/local/lib/python3.10/dist-packages/timm/data/parsers/parser_image_folder.py”, line 11, in
          from timm.utils.misc import natural_key
          File “/usr/local/lib/python3.10/dist-packages/timm/utils/__init__.py”, line 2, in
          from .checkpoint_saver import CheckpointSaver
          File “/usr/local/lib/python3.10/dist-packages/timm/utils/checkpoint_saver.py”, line 15, in
          from .model import unwrap_model, get_state_dict
          File “/usr/local/lib/python3.10/dist-packages/timm/utils/model.py”, line 8, in
          from torchvision.ops.misc import FrozenBatchNorm2d
          File “/usr/local/lib/python3.10/dist-packages/torchvision/__init__.py”, line 6, in
          from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
          File “/usr/local/lib/python3.10/dist-packages/torchvision/_meta_registrations.py”, line 4, in
          import torch._custom_ops
          ModuleNotFoundError: No module named ‘torch._custom_ops’

          1. Thanks for the update. How do I reload the notebook. Will simply opening the notebook again will do ?

  10. Hey,

    Wanted to thank you for this, it works like a charm

    I’m getting this error suddenly:

    #@title Upload images and start training
    #@markdown Begineers: Use a different `Project_folder` each time when you upload the images.
    (removed)
    –mem_eff_attn –xformers –bucket_no_upscale –noise_offset=0.05

    Any idea what’s causing the issue? Thanks!

      1. Hey,

        Sorry about that. No worries, just got it fixed after browsing through the other comments.

        But I tried uploading the newly produced LORA to mage.space for testing, and I’m receiving this error instead:

        “Error message: ‘NoneType’ object has no attribute ‘groups'”

        I’ve cross-tested with other LORA’s that I’ve produced and uploaded successfully in the past, and seem to be receiving the same error as above for some weird reason.

        Any idea what might be causing the issue? Thanks!

        1. Hi, I tested an LoRA made by this version of notebook and it is working correctly on A1111. You can contact Mage to see what they are looking for in the file and let me know.

  11. Hi! First off, love this website! So much info, learned a lot. But i just can’t seem to make this LoRA notebook work for me. I get the following error:

    Already installed.
    python3: can’t open file ‘/content/finetune/make_captions.py’: [Errno 2] No such file or directory
    /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ‘/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c104cuda9SetDeviceEi’If you don’t plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
    warn(
    The following values were not passed to `accelerate launch` and had defaults used instead:
    `–num_processes` was set to a value of `1`
    `–num_machines` was set to a value of `1`
    `–mixed_precision` was set to a value of `’no’`
    `–dynamo_backend` was set to a value of `’no’`
    To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
    /usr/bin/python3: can’t open file ‘/content/./train_network.py’: [Errno 2] No such file or directory
    Traceback (most recent call last):
    File “/usr/local/bin/accelerate”, line 8, in
    sys.exit(main())
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py”, line 47, in main
    args.func(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 1023, in launch_command
    simple_launcher(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command ‘[‘/usr/bin/python3’, ‘./train_network.py’, ‘–enable_bucket’, ‘–min_bucket_reso=256’, ‘–max_bucket_reso=2048’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/BrentCorrigan6’, ‘–resolution=512,650’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–network_alpha=64’, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=5e-05’, ‘–unet_lr=0.0001’, ‘–network_dim=64’, ‘–output_name=BrentCorrigan001’, ‘–lr_scheduler_num_cycles=1’, ‘–no_half_vae’, ‘–learning_rate=0.0001’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=100000’, ‘–save_every_n_epochs=99999’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–seed=1234’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–optimizer_type=AdamW’, ‘–max_data_loader_n_workers=1’, ‘–clip_skip=2’, ‘–bucket_reso_steps=64’, ‘–max_train_epochs=1’, ‘–mem_eff_attn’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.05′]’ returned non-zero exit status 2.

    1. I saw that Kwandry had the same issue so i did what he stated, “runtime needed to get disconnected and reconnected” But now i get this error:

      ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
      torchaudio 2.1.0+cu121 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
      torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
      torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
      torchvision 0.16.0+cu121 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
      Successfully installed GitPython-3.1.42 antlr4-python3-runtime-4.9.3 docker-pycreds-0.4.0 gitdb-4.0.11 lit-17.0.6 mypy-extensions-1.0.0 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 omegaconf-2.3.0 pathtools-0.1.2 pyre-extensions-0.0.29 sentry-sdk-1.40.6 setproctitle-1.3.3 smmap-5.0.1 tk-0.1.0 tokenizers-0.13.3 torch-2.0.1 transformers-4.30.2 triton-2.0.0 typing-inspect-0.9.0 voluptuous-0.13.1 wandb-0.15.0 xformers-0.0.20
      WARNING: The following packages were previously imported in this runtime:
      [pydevd_plugins]
      You must restart the runtime in order to use newly installed versions.

      1. Also this pops up, but even if i restart the runtime it keeps popping up.

        WARNING: The following packages were previously imported in this runtime:
        [pydevd_plugins]
        You must restart the runtime in order to use newly installed versions.

        Restarting will lose all runtime state, including local variables.

  12. Hello, I am getting the following error after the image upload step is completed, both with my own images as well as the sample image. This only just started recently (I was able to use this notebook successfully as recently as a few weeks ago) so did something change or am I messing up something? Error is as follows:

    Already installed.
    python3: can’t open file ‘/content/finetune/make_captions.py’: [Errno 2] No such file or directory
    /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ‘/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c104cuda9SetDeviceEi’If you don’t plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
    warn(
    The following values were not passed to `accelerate launch` and had defaults used instead:
    `–num_processes` was set to a value of `1`
    `–num_machines` was set to a value of `1`
    `–mixed_precision` was set to a value of `’no’`
    `–dynamo_backend` was set to a value of `’no’`
    To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
    /usr/bin/python3: can’t open file ‘/content/./train_network.py’: [Errno 2] No such file or directory
    Traceback (most recent call last):
    File “/usr/local/bin/accelerate”, line 8, in
    sys.exit(main())
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py”, line 47, in main
    args.func(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 1023, in launch_command
    simple_launcher(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command ‘[‘/usr/bin/python3’, ‘./train_network.py’, ‘–enable_bucket’, ‘–min_bucket_reso=256’, ‘–max_bucket_reso=2048’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/AndyLau3’, ‘–resolution=512,650’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–network_alpha=64’, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=5e-05’, ‘–unet_lr=0.0001’, ‘–network_dim=64’, ‘–output_name=AndyLau001’, ‘–lr_scheduler_num_cycles=1’, ‘–no_half_vae’, ‘–learning_rate=0.0001’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=100000’, ‘–save_every_n_epochs=99999’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–seed=1234’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–optimizer_type=AdamW’, ‘–max_data_loader_n_workers=1’, ‘–clip_skip=2’, ‘–bucket_reso_steps=64’, ‘–max_train_epochs=1’, ‘–mem_eff_attn’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.05′]’ returned non-zero exit status 2.

    1. Hi, I just ran the SD 1.5 training notebook and it worked fine.

      The folder names in the output looks incorrect. Perhaps your notebook was inadvertently changed. Please follow the notebook link on this site and rerun the notebook.

      Let me know how it goes.

      1. Figured it out, runtime needed to get disconnected and reconnected for it to work properly. Thanks!

  13. Hi – I’m using your google colab notebook and when I hit run, I’m not seeing the message “Mounted at…”, instead I’m getting the message “Drive already mounted at /content/drive” and I’m unable to select “choose files” / clicking it does not do anything. How should I proceed?

        1. Got it working – thanks!

          1. What # of epoches do you recommend? (I saw 10-100 recommended on other sites but noticed you had 1 in the tutorial)
          2. Have you tested how # of training images affects results? (What # is ideal?)
          3. What is the ideal value for Image_repeats?

          1. The number of training steps is roughly number of images x Image_repeats x number of epoches.

            In the SD LoRA workflow, you can leave the number of epoches as 1 and change the image repeats. The initial value on the notebook should be good for most realistic face. But it is different for each dataset, depending on the quality and the number of images. Try changing the image_repeats up and down and use the same seeds to generate sample images. The ideal value is that it just trained to show what you want without creating the overcooking artifacts mentioned in the article.

  14. A nice update, thanks. I was tired of dealing with Gradio 🙂 I learned a lot from your website, thanks for your effort.

  15. There seems to be a new issue. I get the following error:

    ImportError: cannot import name ‘set_documentation_group’ from ‘gradio_client.documentation’ (/usr/local/lib/python3.10/dist-packages/gradio_client/documentation.py)

  16. I followed the guide to train a Lora for SDXL and purchased the colab notebook.
    I followed every step described in the guide exactly, but when I click on “Start training”, after some time the colab script dies with the follow message:

    [Dataset 0]
    loading image sizes.
    100% 24/24 [00:00<00:00, 313.46it/s]
    make buckets
    number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
    bucket 0: resolution (1024, 1024), count: 24
    mean ar error (without repeats): 0.0
    preparing accelerator
    loading model for process 0/1
    load Diffusers pretrained models: stabilityai/stable-diffusion-xl-base-1.0, variant=fp16
    Loading pipeline components…: 33% 2/6 [00:02<00:05, 1.43s/it]Traceback (most recent call last):
    File "/usr/local/bin/accelerate", line 8, in
    sys.exit(main())
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py”, line 47, in main
    args.func(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 1023, in launch_command
    simple_launcher(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command ‘[‘/usr/bin/python3’, ‘./sdxl_train_network.py’, ‘–enable_bucket’, ‘–min_bucket_reso=256’, ‘–max_bucket_reso=2048’, ‘–pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/elenvel’, ‘–resolution=1024,1024’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–network_alpha=32’, ‘–training_comment=3 repeats. More info: https://civitai.com/articles/1771‘, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=3e-05’, ‘–unet_lr=3e-05’, ‘–network_dim=32’, ‘–output_name=elenvel’, ‘–lr_scheduler_num_cycles=50’, ‘–no_half_vae’, ‘–learning_rate=3e-05’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=400’, ‘–save_every_n_epochs=1’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–cache_latents_to_disk’, ‘–optimizer_type=AdamW’, ‘–max_train_epochs=50’, ‘–max_data_loader_n_workers=0’, ‘–caption_dropout_rate=0.05’, ‘–bucket_reso_steps=64’, ‘–min_snr_gamma=5’, ‘–gradient_checkpointing’, ‘–xformers’, ‘–noise_offset=0.0′]’ died with .

    This is very infuriating tbh. Any idea? Any fix required?

    Thanks.

  17. Hello,

    So a LoRA is a small model of a specific object? Instead of doing a huge checkpoint something more general?
    It can be person or object?
    And it will only create images with that specific person or object?
    Can I use them with other models to make the more specific?

    For instance, I have a hand made cabinet, it is different looking. Would I take picture of it to create a LoRA. The if I said cabinet – it would draw MY unique cabinet. But if I used it with I said Andy Lau standing by a cabinet, would it show that Cantopop actor with my cabinet?

    Or am I missing the point here?

    Thanks,

    V

    1. Sounds about right. You will need to take a few more photos to create a lora.

      Creating a lora for a cabinet should work, but it is a less common use case that may need to tweak the training parameters.

      Using two lora for objects should also work. But if you have a hard time making them show up at the same time, you can generate an image with andy first and use inpainting to add the cabinet.

  18. I keep getting an error, I tried with my own images, the images you provide, and multiple “Pretrained model name or path” options as well, but it seems to be running into the same issue every time:

    subprocess.CalledProcessError: Command ‘[‘/usr/bin/python3’, ‘./sdxl_train.py’, ‘–enable_bucket’, ‘–min_bucket_reso=256’, ‘–max_bucket_reso=2048’, ‘–pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/emma’, ‘–resolution=512,512’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–save_model_as=safetensors’, ‘–output_name=first’, ‘–lr_scheduler_num_cycles=1’, ‘–max_data_loader_n_workers=0’, ‘–learning_rate=1e-05’, ‘–lr_scheduler=cosine’, ‘–lr_warmup_steps=1’, ‘–train_batch_size=1’, ‘–max_train_steps=13’, ‘–save_every_n_epochs=10’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–cache_latents’, ‘–optimizer_type=AdamW8bit’, ‘–max_data_loader_n_workers=0’, ‘–bucket_reso_steps=64’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.0′]’ died with .

      1. If you mean this:
        Step 5: Enter training settings
        Source model
        Go to the LoRA page > Training tab > Source model tab.
        Then yeah, I did.

        1. I tested the notebook and its working as expected. I notice from your error message that it is not calling the incorrect script sdxl_train.py. It should be sdxl_train_network.py. I think there’s something wrong in your setting. You can follow the tutorial again. If it doesn’t work, post or send me the whole log.

          1. I have the exact same error, and I followed the SDXL guide at least 5 times, with 100% precision! Something is wrong with the colab notebook. Please fix it, thanks.

          2. An update. I don’t think it is possible to train an SDXL LoRA on a free colab notebook with the current software. You will need to have a Google Colab subscription.

  19. Hi. I’m learning how to make Loras using your notebook. I noticed you didn’t mention anything about regularization. Would you please also add a section that covers that please?

    1. Also, another thought. In the dreambooth example, you mentioned that if we include the class name “woman” it leverages all the information that the model has previously learnt about women in the model, which is good. Why have we not done the same here in the Lora section? I don’t see a mention of the token “man”?

    2. Hi, the preset already uses regularization. You normally need to tweak it if you run into issues. It is out of scope for this beginner’s post.

  20. I keep getting the same error despite following the guide perfectly.

    subprocess.CalledProcessError: Command ‘[‘/usr/bin/python3’, ‘./train_network.py’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/Test’, ‘–resolution=512,650’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–network_alpha=64’, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=5e-05’, ‘–unet_lr=0.0001’, ‘–network_dim=64’, ‘–output_name=last’, ‘–lr_scheduler_num_cycles=1’, ‘–no_half_vae’, ‘–learning_rate=0.0001’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=7750’, ‘–save_every_n_epochs=1’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–seed=1234’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–optimizer_type=AdamW’, ‘–max_data_loader_n_workers=1’, ‘–clip_skip=2’, ‘–bucket_reso_steps=64’, ‘–mem_eff_attn’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.05′]’ returned non-zero exit status 1.

  21. I sometimes get it to work, with great results too, but sometimes, at the end, it returns:

    steps: 100% 3600/3600 [2:13:43<00:00, 2.23s/it, loss=nan]

    Which you say I should not have. What am I doing that is causing this? How can I correct it?

    1. this is usually an issue with setting. Make sure you have followed the ones in the tutorial. You can also try to reducing the learning rate, and make sure bucketing is on.

  22. Massive fail mate – no matter what credentials I enter I get “incorrect credentials” when logging in at the training step

      1. That’s exactly what I’m doing… are there any constraints on the credentials you can pick? It’s a strong password

      2. I have the same problem. It doesn’t let you specify the credentials. The only way to login is using the a/a default setting. Is there a way to change this?

  23. Hi! When training the Lora, the google colab gives me the following error and stops:

    [Dataset 0]
    loading image sizes.
    100% 1000/1000 [00:01<00:00, 794.47it/s]
    prepare dataset
    preparing accelerator
    loading model for process 0/1
    load Diffusers pretrained models: kohbanye/pixel-art-style
    text_encoder/model.safetensors not found
    UNet2DConditionModel: 64, 8, 768, False, False
    U-Net converted to original U-Net
    Enable memory efficient attention for U-Net
    Traceback (most recent call last):
    File "/content/kohya_ss/./train_network.py", line 990, in
    trainer.train(args)
    File “/content/kohya_ss/./train_network.py”, line 222, in train
    vae.set_use_memory_efficient_attention_xformers(args.xformers)
    File “/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py”, line 235, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
    File “/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py”, line 231, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
    File “/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py”, line 231, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
    File “/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py”, line 231, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
    File “/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py”, line 228, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
    File “/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py”, line 199, in set_use_memory_efficient_attention_xformers
    raise ValueError(
    ValueError: torch.cuda.is_available() should be True but is False. xformers’ memory efficient attention is only available for GPU
    Traceback (most recent call last):
    File “/usr/local/bin/accelerate”, line 8, in
    sys.exit(main())
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py”, line 47, in main
    args.func(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 994, in launch_command
    simple_launcher(args)
    File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 636, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command ‘[‘/usr/bin/python3’, ‘./train_network.py’, ‘–pretrained_model_name_or_path=kohbanye/pixel-art-style’, ‘–train_data_dir=/content/drive/MyDrive/CraftpixLora/Training/Craftpix’, ‘–resolution=512,650’, ‘–output_dir=/content/drive/MyDrive/CraftpixLora/Lora’, ‘–network_alpha=64’, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=5e-05’, ‘–unet_lr=0.0001’, ‘–network_dim=64’, ‘–output_name=Craftpix’, ‘–lr_scheduler_num_cycles=1’, ‘–no_half_vae’, ‘–learning_rate=0.0001’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=334’, ‘–save_every_n_epochs=1’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–seed=1234’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–optimizer_type=AdamW’, ‘–max_data_loader_n_workers=1’, ‘–clip_skip=2’, ‘–bucket_reso_steps=64’, ‘–mem_eff_attn’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.05′]’ returned non-zero exit status 1.

    1. Hi, I just tested the notebook and it is running correctly (on Colab Pro). It appears that you have not connected to a gpu. please check your runtime.

  24. This comment at the very top of the page regarding enabling the bucketing option is worth repeating at the end when discussing settings on the Training/Parameters tab as it might go unnoticed or forgotten, but seems to be very important if photos are of different aspect ratios–important enough to make the entire operation fail if not enabled. Just saying.

  25. I bought the gumroad colab notebook

    running with a A100 GPU, I got this error :

    The following directories listed in your path were found to be non-existent: {PosixPath(‘/usr/local/lib/python3.10/dist-packages/cv2/../../lib64’)}
    /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.0-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia did not contain [‘libcudart.so’, ‘libcudart.so.11.0’, ‘libcudart.so.12.0’] as expected! Searching further paths…
    warn(msg)
    The following directories listed in your path were found to be non-existent: {PosixPath(‘//ipykernel.pylab.backend_inline’), PosixPath(‘module’)}
    The following directories listed in your path were found to be non-existent: {PosixPath(‘–logtostderr –listen_host=172.28.0.12 –target_host=172.28.0.12 –tunnel_background_save_url=https’), PosixPath(‘//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-11pacqicclkc4 –tunnel_background_save_delay=10s –tunnel_periodic_background_save_frequency=30m0s –enable_output_coalescing=true –output_coalescing_required=true’)}
    The following directories listed in your path were found to be non-existent: {PosixPath(‘/datalab/web/pyright/typeshed-fallback/stdlib,/usr/local/lib/python3.10/dist-packages’)}
    The following directories listed in your path were found to be non-existent: {PosixPath(‘/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events’)}
    The following directories listed in your path were found to be non-existent: {PosixPath(‘/env/python’)}
    The following directories listed in your path were found to be non-existent: {PosixPath(‘http’), PosixPath(‘8013’), PosixPath(‘//172.28.0.1’)}
    CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths…
    /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.0-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate [‘libcudart.so’, ‘libcudart.so.11.0’, ‘libcudart.so.12.0’] files: {PosixPath(‘/usr/local/cuda/lib64/libcudart.so’), PosixPath(‘/usr/local/cuda/lib64/libcudart.so.11.0’)}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION= environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python …OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
    warn(msg)
    DEBUG: Possible options found for libcudart.so: {PosixPath(‘/usr/local/cuda/lib64/libcudart.so’), PosixPath(‘/usr/local/cuda/lib64/libcudart.so.11.0’)}
    CUDA SETUP: PyTorch settings found: CUDA_VERSION=117, Highest Compute Capability: 8.0.
    CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
    CUDA SETUP: Required library version not found: libbitsandbytes_cuda117.so. Maybe you need to compile it from source?
    CUDA SETUP: Defaulting to libbitsandbytes_cpu.so…

    ================================================ERROR=====================================
    CUDA SETUP: CUDA detection failed! Possible reasons:
    1. You need to manually override the PyTorch CUDA version. Please see: “https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
    2. CUDA driver not installed
    3. CUDA not installed
    4. You have multiple conflicting CUDA libraries
    5. Required library not pre-compiled for this bitsandbytes release!
    CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
    CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
    ================================================================================

    CUDA SETUP: Something unexpected happened. Please compile from source:
    git clone https://github.com/TimDettmers/bitsandbytes.git
    cd bitsandbytes
    CUDA_VERSION=117 make cuda11x
    python setup.py install
    CUDA SETUP: Setup Failed!

  26. How can I use different Models? So far I can only use the three model link listed on here. How do I use different one for my training?

          1. hi, must the model from hugging face? I have created my own ckpt model stored in local drive and Google Drive.

  27. In the caption screen, there is a textbox that says “Prefix to add to BLIP caption”. Why not add the person name there?

  28. Hey Andrew,

    I just want to clear up something on the captioning. Does it matter if the thing you’re trying to train the LoRA on is split into different tags? How do you ensure it is just one phrase?

    Say your example – Andy Lau in a black jacket smoking a cigarette in front of a fenced in building
    When you look at the config for the LoRA in SD Web UI, the words “Andy” and “Lau” are split into single tags, as well as “black” and “jacket” – is this supposed to happen or is there a way to ensure it is one phrase when you edit the text files?

  29. Hey Andrew,
    I’ll admit that I’m new at this (which is why I bought your tutorial), but I can follow simple instructions.

    Your notebook isn’t working for me. I get errors just uploading the pictures from your tutorial.
    Do you have any advice as to how to get this to work?

    —————————————————————————
    MessageError Traceback (most recent call last)
    in ()
    24 else:
    25 get_ipython().system(‘mkdir -p {imagePath}’)
    —> 26 uploaded = files.upload()
    27 for filename in uploaded.keys():
    28 dst_path = imagePath + ‘/’ + filename

    3 frames
    /usr/local/lib/python3.10/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
    101 ):
    102 if ‘error’ in reply:
    –> 103 raise MessageError(reply[‘error’])
    104 return reply.get(‘data’, None)
    105

    MessageError: RangeError: Maximum call stack size exceeded.

    1. Hi, I just tested uploading the sample images from the tutorial and it is working correctly. You should see upload screenshoot

      Did you:
      – Connect to your Google Drive?
      – Upload the 16 images of Andy Lau in the tutorial or your own?

      1. Thanks for the reply Andrew,
        I got it working. The drive was mounting, and I still had the same problem on 3 different Macs…. then I tried it in a different browser.

        Seems Safari was the problem. It worked perfectly fine in Chrome.
        Thanks for the great tutorial and support!

  30. A quick note for people having an issue with the following fatal error
    mportError: cannot import name ‘StableDiffusionPipeline’ from ‘diffusers’ (E:\Py\env\lib\site-packages\diffusers_init_.py)

    open the command option in google collab (at the bottom left)

    run these two commands
    pip uninstall diffusers
    pip install diffusers

    this fixed it for me

    source:
    https://stackoverflow.com/questions/73992681/importerror-cannot-import-name-stablediffusionpipeline-from-diffusers

    1. There does seem to be another issue

      having an issue fixing this one.

      Downloading (…)del.fp16.safetensors: 100% 1.72G/1.72G [01:28<00:00, 19.5MB/s]
      Fetching 11 files: 100% 11/11 [01:30<00:00, 8.24s/it]
      Loading pipeline components…: 100% 4/4 [00:01<00:00, 2.04it/s]
      Traceback (most recent call last):
      File "/content/kohya_ss/./sdxl_train_network.py", line 176, in
      trainer.train(args)
      File “/content/kohya_ss/train_network.py”, line 214, in train
      model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
      File “/content/kohya_ss/./sdxl_train_network.py”, line 37, in load_target_model
      ) = sdxl_train_util.load_target_model(args, accelerator, sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, weight_dtype)
      File “/content/kohya_ss/library/sdxl_train_util.py”, line 34, in load_target_model
      ) = _load_target_model(
      File “/content/kohya_ss/library/sdxl_train_util.py”, line 84, in _load_target_model
      pipe = StableDiffusionXLPipeline.from_pretrained(
      File “/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py”, line 1191, in from_pretrained
      raise ValueError(
      ValueError: Pipeline expected {‘vae’, ‘text_encoder’, ‘unet’, ‘tokenizer’, ‘scheduler’, ‘text_encoder_2’, ‘tokenizer_2’}, but only {‘vae’, ‘text_encoder’, ‘unet’, ‘tokenizer’, ‘scheduler’} were passed.
      Traceback (most recent call last):
      File “/usr/local/bin/accelerate”, line 8, in
      sys.exit(main())
      File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py”, line 47, in main
      args.func(args)
      File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 986, in launch_command
      simple_launcher(args)
      File “/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py”, line 628, in simple_launcher
      raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
      subprocess.CalledProcessError: Command ‘[‘/usr/bin/python3’, ‘./sdxl_train_network.py’, ‘–enable_bucket’, ‘–min_bucket_reso=256’, ‘–max_bucket_reso=2048’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–train_data_dir=/content/drive/MyDrive/AI_PICS/training/ChrisAllen’, ‘–resolution=512,650’, ‘–output_dir=/content/drive/MyDrive/AI_PICS/Lora’, ‘–network_alpha=64’, ‘–save_model_as=safetensors’, ‘–network_module=networks.lora’, ‘–text_encoder_lr=5e-05’, ‘–unet_lr=0.0001’, ‘–network_dim=64’, ‘–output_name=blah’, ‘–lr_scheduler_num_cycles=1’, ‘–no_half_vae’, ‘–learning_rate=0.0001’, ‘–lr_scheduler=constant’, ‘–train_batch_size=3’, ‘–max_train_steps=3500’, ‘–save_every_n_epochs=1’, ‘–mixed_precision=fp16’, ‘–save_precision=fp16’, ‘–seed=1234’, ‘–caption_extension=.txt’, ‘–cache_latents’, ‘–cache_latents_to_disk’, ‘–optimizer_type=AdamW’, ‘–max_data_loader_n_workers=1’, ‘–clip_skip=2’, ‘–bucket_reso_steps=64’, ‘–mem_eff_attn’, ‘–xformers’, ‘–bucket_no_upscale’, ‘–noise_offset=0.05′]’ returned non-zero exit status 1.
      03:18:00-173236 INFO There is no running process to kill.

  31. MessageError Traceback (most recent call last)
    in ()
    2 #@markdown Begineers: Use a different `Project_folder` each time when you upload the images.
    3 from google.colab import drive
    —-> 4 drive.mount(‘/content/drive’)
    5
    6 Project_folder = ‘AI_PICS/training/AndyLau’ #@param {type:”string”}

    3 frames
    /usr/local/lib/python3.10/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
    101 ):
    102 if ‘error’ in reply:
    –> 103 raise MessageError(reply[‘error’])
    104 return reply.get(‘data’, None)
    105

    MessageError: Error: credential propagation was unsuccessful

    Why do I see this error message pop up

  32. Thank you for the amazing tutorial. I had to go back on Gumroad and give you some money. That’s the one tutorial that was easy with amazing description to follow. This make having fun with AI for the common folk possible. Gradio did give me some issues with crashing, but otherwise, this has been a smooth ride. Thank you so much!

  33. any idea why I get “Unexpected token ‘<', " <!DOCTYPE "… is not valid JSON" this error in kohya when I try and caption and when I try and set folder for training?

  34. is there a way to do it without the gpu ? I try few time and lose the free version with stupid mistake and nether could do my lora… now i’m out of gpu and stuck to follow this tuto :/

    I’m on mac os for the record

      1. By the way, two questions:
        1. Why do you say that it is not necessary to crop images to 512*512px? All manuals on training LORA recommend to do this.
        2. What’s the reason to change “a man” to “Andy Lau”? I didn’t change it and LORA successfully worked.

        1. 1. Those must be old guides. New trainers have a function called bucketing that can make use of images with different aspect ratios.
          2. Using “a man” would make all men look like Andy Lau. (which may not be a bad thing!)

  35. Ok, this is great, but my google drive is “My Drive” with a space between my and drive. I assuming this wont work for me because the process wont recognize the space? is there another way around this? apparently I can’t rename my google drive?

    1. hello! fantastic tutorial!

      i want to make a lora of my friend (i have his permission). i have some fantastic high -res colored as well as black and white/graycale images of him.

      my question is, can i include the black and white/grayscale images i have of him in the training images dataset?

      looking to hear from you!

Leave a comment

Your email address will not be published. Required fields are marked *