diff --git a/blogs/2022/12/15/astronaut_rides_horse.png b/blogs/2022/12/15/astronaut_rides_horse.png new file mode 100644 index 0000000..f536320 --- /dev/null +++ b/blogs/2022/12/15/astronaut_rides_horse.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dd017d199ee452d240c8d5c2de9aa954328a3dab101c534773b584059e0e59f +size 1081184 diff --git a/blogs/2022/12/15/stable-diffusion.md b/blogs/2022/12/15/stable-diffusion.md new file mode 100644 index 0000000..150e96e --- /dev/null +++ b/blogs/2022/12/15/stable-diffusion.md @@ -0,0 +1,105 @@ +# Local Stable Diffusion + +Stable diffusion (SD) is an AI technique for generating images from text prompts. +Similar to DALL-E, which drives the popular [craiyon](https://www.craiyon.com/), SD is available as an [online tool](https://huggingface.co/spaces/stabilityai/stable-diffusion). +These web tools are amazing, and easy to use, but can be frustrating - they're often under high load, and impose long waiting times. +They use a good chunk of computational resources, specifically GPUs and so have generally been out of reach for even people with powerful personal machines. + +Now, however, SD has reached the point it can be run using (admittedly, high-end) consumer video cards. +Stability AI - the model's developers - recently [published a blog post](https://stability.ai/blog/stable-diffusion-v2-release) open-sourcing SD 2. +There's a README for getting started [here](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/README.md), but it has a couple of gotchas and assumptions which plenty of people (like myself) won't have known if they're not already familiar with the technologies in use, such as Python and CUDA. + +This post is descibes my experience setting up SD 2 on my local workstation. +For hardware, I have an i7-6700k, RTX 2080 Super and 48GB of RAM. +If you have an AMD video card, you won't be able to use CUDA, but you may be able to use GPU acceleration regardless using something ROCm. +In this post I'm using Arch Linux, but I have successfully set it up on Windows too. +Python is an exceedingly portable language, so it should work wherever you're able to get a Python installation. + +This post assumes that you already have a working Python installation. + +## Install CUDA + +CUDA needs to be installed separately from Python dependencies. +It is quite large, and as with all NVIDIA driver installations, can be a bit confusing. +On Linux, it's straightforward to install it from your distribution's package manager. + +```bash +sudo pacman -Syu +sudo pacman -S cuda +``` + +On Windows, you will need to go to NVIDIA's site to download the correct version of CUDA. +At time of writing, the SD 2 script expects CUDA 11.7, and will not work if you install the latest 12.0 version. +To get older versions, go to their [download archive](https://developer.nvidia.com/cuda-toolkit-archive) and select the appropriate one. + +## Set up a virtual environment an PyTorch + +Python can be installed at a system level, but it's usually a good idea to set up a virtual environment for your project. +This isolates the project dependencies from the wider system, and makes your setup reproducible. +I will use [`pipenv`](https://pipenv.pypa.io/en/latest/index.html) as it's what I'm familiar with. + +PyTorch is a deep-learning framework, used to put together machine learning pipelines. + +To get a command to install the relevant dependencies, go to [PyTorch's site](https://pytorch.org/get-started/locally/) and choose the options for your setup. +In my case, I replaced `pip3` with `pipenv` as I want to install dependencies to a new virtual environment instead of to the system. + +```bash +mkdir stable-diffusion && cd stable-diffusion +pipenv install torch torchvision torchaudio +``` + +## Install Stable Diffusion + +SD 2 is provided by the `diffusers` package. +We can install it in our virtual environment as follows: + +```bash +pipenv shell +pip3 install git+https://github.com/huggingface/diffusers.git transformers accelerate scipy +exit +``` + +We use `pipenv shell` to enter a shell using the virtual environment, before using the `pip3` command described on their README. +After installing dependencies, we can leave the virtual environment shell and return to our original one. +`transformers` and `accelerate` are optional, but used to reduce memory usage and so are recommended. + +## Create a Python script + +Python does have an interactive envronment, but so save our fingers let's use a `stable-diffusion.py` script to contain and run our Python code. +Here I'll mostly copy the Python included in their README: + +```python +import torch +from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler + +model_id = "stabilityai/stable-diffusion-2" + +# Use the Euler scheduler here instead +scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") +pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16) +pipe = pipe.to("cuda") +pipe.enable_attention_slicing() + +prompt = "a photo of an astronaut riding a horse on mars" +image = pipe(prompt, height=768, width=768).images[0] + +image.save("astronaut_rides_horse.png") +``` + +I've made two additions here. +First, I've added `import torch` at the top - I'm not sure why the code in the README omits this, but it's needed to work. + +I've also added `pipe.enable_attention_slicing()` - this is a more memory-efficient running mode, which is less intensive at the cost of taking longer. +If you have a monster video card, this may not be necessary. + +At this point, we're done - after running the script successfully, you should have a new picture of an astronaut riding a horse on mars. +Here's mine! + +![astronaut rides horse](astronaut_rides_horse.png) + +## Some nice-to-haves + + +## Wrapping up + +