Compare commits
No commits in common. "5f8a0cefe59a9fd1a5b2daf2470295312f3b3c07" and "b7c2193ba57fd4726820116f81ddbc680b609843" have entirely different histories.
5f8a0cefe5
...
b7c2193ba5
|
@ -1 +0,0 @@
|
||||||
*.png filter=lfs diff=lfs merge=lfs -text
|
|
BIN
blogs/2022/12/15/astronaut_rides_horse.png (Stored with Git LFS)
BIN
blogs/2022/12/15/astronaut_rides_horse.png (Stored with Git LFS)
Binary file not shown.
|
@ -1,128 +0,0 @@
|
||||||
# Local Stable Diffusion
|
|
||||||
|
|
||||||
![astronaut rides horse](astronaut_rides_horse.png)
|
|
||||||
|
|
||||||
Stable diffusion (SD) is an AI technique for generating images from text prompts.
|
|
||||||
Similar to DALL-E, which drives the popular [craiyon](https://www.craiyon.com/), SD is available as an [online tool](https://huggingface.co/spaces/stabilityai/stable-diffusion).
|
|
||||||
These web tools are amazing, and easy to use, but can be frustrating - they're often under high load, and impose long waiting times.
|
|
||||||
They use a good chunk of computational resources, specifically GPUs and so have generally been out of reach for even people with powerful personal machines.
|
|
||||||
|
|
||||||
Now, however, SD has reached the point it can be run using (admittedly, high-end) consumer video cards.
|
|
||||||
Stability AI - the model's developers - recently [published a blog post](https://stability.ai/blog/stable-diffusion-v2-release) open-sourcing SD 2.
|
|
||||||
There's a README for getting started [here](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/README.md), but it has a couple of gotchas and assumptions which plenty of people (like myself) won't have known if they're not already familiar with the technologies in use, such as Python and CUDA.
|
|
||||||
|
|
||||||
This post is descibes my experience setting up SD 2 on my local workstation.
|
|
||||||
For hardware, I have an i7-6700k, RTX 2080 Super and 48GB of RAM.
|
|
||||||
If you have an AMD video card, you won't be able to use CUDA, but you may be able to use GPU acceleration regardless using something ROCm.
|
|
||||||
In this post I'm using Arch Linux, but I have successfully set it up on Windows too.
|
|
||||||
Python is an exceedingly portable language, so it should work wherever you're able to get a Python installation.
|
|
||||||
|
|
||||||
This post assumes that you already have a working Python installation.
|
|
||||||
|
|
||||||
## Install CUDA
|
|
||||||
|
|
||||||
CUDA needs to be installed separately from Python dependencies.
|
|
||||||
It is quite large, and as with all NVIDIA driver installations, can be a bit confusing.
|
|
||||||
On Linux, it's straightforward to install it from your distribution's package manager.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo pacman -Syu
|
|
||||||
sudo pacman -S cuda
|
|
||||||
```
|
|
||||||
|
|
||||||
On Windows, you will need to go to NVIDIA's site to download the correct version of CUDA.
|
|
||||||
At time of writing, the SD 2 script expects CUDA 11.7, and will not work if you install the latest 12.0 version.
|
|
||||||
To get older versions, go to their [download archive](https://developer.nvidia.com/cuda-toolkit-archive) and select the appropriate one.
|
|
||||||
|
|
||||||
## Set up a virtual environment an PyTorch
|
|
||||||
|
|
||||||
Python can be installed at a system level, but it's usually a good idea to set up a virtual environment for your project.
|
|
||||||
This isolates the project dependencies from the wider system, and makes your setup reproducible.
|
|
||||||
I will use [`pipenv`](https://pipenv.pypa.io/en/latest/index.html) as it's what I'm familiar with.
|
|
||||||
|
|
||||||
PyTorch is a deep-learning framework, used to put together machine learning pipelines.
|
|
||||||
|
|
||||||
To get a command to install the relevant dependencies, go to [PyTorch's site](https://pytorch.org/get-started/locally/) and choose the options for your setup.
|
|
||||||
In my case, I replaced `pip3` with `pipenv` as I want to install dependencies to a new virtual environment instead of to the system.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
mkdir stable-diffusion && cd stable-diffusion
|
|
||||||
pipenv install torch torchvision torchaudio
|
|
||||||
```
|
|
||||||
|
|
||||||
## Install Stable Diffusion
|
|
||||||
|
|
||||||
SD 2 is provided by the `diffusers` package.
|
|
||||||
We can install it in our virtual environment as follows:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pipenv shell
|
|
||||||
pip3 install git+https://github.com/huggingface/diffusers.git transformers accelerate scipy
|
|
||||||
exit
|
|
||||||
```
|
|
||||||
|
|
||||||
We use `pipenv shell` to enter a shell using the virtual environment, before using the `pip3` command described on their README.
|
|
||||||
After installing dependencies, we can leave the virtual environment shell and return to our original one.
|
|
||||||
`transformers` and `accelerate` are optional, but used to reduce memory usage and so are recommended.
|
|
||||||
|
|
||||||
## Create a Python script
|
|
||||||
|
|
||||||
Python does have an interactive envronment, but so save our fingers let's use a `stable-diffusion.py` script to contain and run our Python code.
|
|
||||||
Here I'll mostly copy the Python included in their README:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import torch
|
|
||||||
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
|
|
||||||
|
|
||||||
model_id = "stabilityai/stable-diffusion-2"
|
|
||||||
|
|
||||||
# Use the Euler scheduler here instead
|
|
||||||
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
|
|
||||||
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
|
|
||||||
pipe = pipe.to("cuda")
|
|
||||||
pipe.enable_attention_slicing()
|
|
||||||
|
|
||||||
prompt = "a photo of an astronaut riding a horse on mars"
|
|
||||||
image = pipe(prompt, height=768, width=768).images[0]
|
|
||||||
|
|
||||||
image.save("astronaut_rides_horse.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
I've made two additions here.
|
|
||||||
First, I've added `import torch` at the top - I'm not sure why the code in the README omits this, but it's needed to work.
|
|
||||||
|
|
||||||
I've also added `pipe.enable_attention_slicing()` - this is a more memory-efficient running mode, which is less intensive at the cost of taking longer.
|
|
||||||
If you have a monster video card, this may not be necessary.
|
|
||||||
|
|
||||||
At this point, we're done - after running the script successfully, you should have a new picture of an astronaut riding a horse on mars.
|
|
||||||
|
|
||||||
## Some nice-to-haves
|
|
||||||
|
|
||||||
In this basic script we only have the one, hardcoded prompt.
|
|
||||||
To change it, we need to update the file itself.
|
|
||||||
Instead, we can change how `prompt` is set, and have it read from command-line parameters instead.
|
|
||||||
|
|
||||||
```python
|
|
||||||
# at the top of the file
|
|
||||||
import sys
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
prompt = " ".join(sys.argv[1:])
|
|
||||||
```
|
|
||||||
|
|
||||||
While we're at it, we can also base the filename on the input prompt:
|
|
||||||
|
|
||||||
```python
|
|
||||||
image.save(f"{prompt.replace(" ", "_")}.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
## Wrapping up
|
|
||||||
|
|
||||||
And that's it!
|
|
||||||
Enjoy making some generative art.
|
|
||||||
My favourites so far have been prefixing "psychedelic" to things.
|
|
||||||
I've also been enjoying generating descriptions with [ChatGPT](https://chat.openai.com/chat) and plugging them into SD, for some zero-effort creativity.
|
|
||||||
As always, if anything's out of place of if you'd like to get in touch, please [send me an email!](mailto:me@ktyl.dev).
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue