Compare commits
2 Commits
b7c2193ba5
...
5f8a0cefe5
Author | SHA1 | Date |
---|---|---|
ktyl | 5f8a0cefe5 | |
ktyl | 89f56103f3 |
|
@ -0,0 +1 @@
|
|||
*.png filter=lfs diff=lfs merge=lfs -text
|
Binary file not shown.
|
@ -0,0 +1,128 @@
|
|||
# Local Stable Diffusion
|
||||
|
||||
![astronaut rides horse](astronaut_rides_horse.png)
|
||||
|
||||
Stable diffusion (SD) is an AI technique for generating images from text prompts.
|
||||
Similar to DALL-E, which drives the popular [craiyon](https://www.craiyon.com/), SD is available as an [online tool](https://huggingface.co/spaces/stabilityai/stable-diffusion).
|
||||
These web tools are amazing, and easy to use, but can be frustrating - they're often under high load, and impose long waiting times.
|
||||
They use a good chunk of computational resources, specifically GPUs and so have generally been out of reach for even people with powerful personal machines.
|
||||
|
||||
Now, however, SD has reached the point it can be run using (admittedly, high-end) consumer video cards.
|
||||
Stability AI - the model's developers - recently [published a blog post](https://stability.ai/blog/stable-diffusion-v2-release) open-sourcing SD 2.
|
||||
There's a README for getting started [here](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/README.md), but it has a couple of gotchas and assumptions which plenty of people (like myself) won't have known if they're not already familiar with the technologies in use, such as Python and CUDA.
|
||||
|
||||
This post is descibes my experience setting up SD 2 on my local workstation.
|
||||
For hardware, I have an i7-6700k, RTX 2080 Super and 48GB of RAM.
|
||||
If you have an AMD video card, you won't be able to use CUDA, but you may be able to use GPU acceleration regardless using something ROCm.
|
||||
In this post I'm using Arch Linux, but I have successfully set it up on Windows too.
|
||||
Python is an exceedingly portable language, so it should work wherever you're able to get a Python installation.
|
||||
|
||||
This post assumes that you already have a working Python installation.
|
||||
|
||||
## Install CUDA
|
||||
|
||||
CUDA needs to be installed separately from Python dependencies.
|
||||
It is quite large, and as with all NVIDIA driver installations, can be a bit confusing.
|
||||
On Linux, it's straightforward to install it from your distribution's package manager.
|
||||
|
||||
```bash
|
||||
sudo pacman -Syu
|
||||
sudo pacman -S cuda
|
||||
```
|
||||
|
||||
On Windows, you will need to go to NVIDIA's site to download the correct version of CUDA.
|
||||
At time of writing, the SD 2 script expects CUDA 11.7, and will not work if you install the latest 12.0 version.
|
||||
To get older versions, go to their [download archive](https://developer.nvidia.com/cuda-toolkit-archive) and select the appropriate one.
|
||||
|
||||
## Set up a virtual environment an PyTorch
|
||||
|
||||
Python can be installed at a system level, but it's usually a good idea to set up a virtual environment for your project.
|
||||
This isolates the project dependencies from the wider system, and makes your setup reproducible.
|
||||
I will use [`pipenv`](https://pipenv.pypa.io/en/latest/index.html) as it's what I'm familiar with.
|
||||
|
||||
PyTorch is a deep-learning framework, used to put together machine learning pipelines.
|
||||
|
||||
To get a command to install the relevant dependencies, go to [PyTorch's site](https://pytorch.org/get-started/locally/) and choose the options for your setup.
|
||||
In my case, I replaced `pip3` with `pipenv` as I want to install dependencies to a new virtual environment instead of to the system.
|
||||
|
||||
```bash
|
||||
mkdir stable-diffusion && cd stable-diffusion
|
||||
pipenv install torch torchvision torchaudio
|
||||
```
|
||||
|
||||
## Install Stable Diffusion
|
||||
|
||||
SD 2 is provided by the `diffusers` package.
|
||||
We can install it in our virtual environment as follows:
|
||||
|
||||
```bash
|
||||
pipenv shell
|
||||
pip3 install git+https://github.com/huggingface/diffusers.git transformers accelerate scipy
|
||||
exit
|
||||
```
|
||||
|
||||
We use `pipenv shell` to enter a shell using the virtual environment, before using the `pip3` command described on their README.
|
||||
After installing dependencies, we can leave the virtual environment shell and return to our original one.
|
||||
`transformers` and `accelerate` are optional, but used to reduce memory usage and so are recommended.
|
||||
|
||||
## Create a Python script
|
||||
|
||||
Python does have an interactive envronment, but so save our fingers let's use a `stable-diffusion.py` script to contain and run our Python code.
|
||||
Here I'll mostly copy the Python included in their README:
|
||||
|
||||
```python
|
||||
import torch
|
||||
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
|
||||
|
||||
model_id = "stabilityai/stable-diffusion-2"
|
||||
|
||||
# Use the Euler scheduler here instead
|
||||
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
|
||||
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
|
||||
pipe = pipe.to("cuda")
|
||||
pipe.enable_attention_slicing()
|
||||
|
||||
prompt = "a photo of an astronaut riding a horse on mars"
|
||||
image = pipe(prompt, height=768, width=768).images[0]
|
||||
|
||||
image.save("astronaut_rides_horse.png")
|
||||
```
|
||||
|
||||
I've made two additions here.
|
||||
First, I've added `import torch` at the top - I'm not sure why the code in the README omits this, but it's needed to work.
|
||||
|
||||
I've also added `pipe.enable_attention_slicing()` - this is a more memory-efficient running mode, which is less intensive at the cost of taking longer.
|
||||
If you have a monster video card, this may not be necessary.
|
||||
|
||||
At this point, we're done - after running the script successfully, you should have a new picture of an astronaut riding a horse on mars.
|
||||
|
||||
## Some nice-to-haves
|
||||
|
||||
In this basic script we only have the one, hardcoded prompt.
|
||||
To change it, we need to update the file itself.
|
||||
Instead, we can change how `prompt` is set, and have it read from command-line parameters instead.
|
||||
|
||||
```python
|
||||
# at the top of the file
|
||||
import sys
|
||||
|
||||
...
|
||||
|
||||
prompt = " ".join(sys.argv[1:])
|
||||
```
|
||||
|
||||
While we're at it, we can also base the filename on the input prompt:
|
||||
|
||||
```python
|
||||
image.save(f"{prompt.replace(" ", "_")}.png")
|
||||
```
|
||||
|
||||
## Wrapping up
|
||||
|
||||
And that's it!
|
||||
Enjoy making some generative art.
|
||||
My favourites so far have been prefixing "psychedelic" to things.
|
||||
I've also been enjoying generating descriptions with [ChatGPT](https://chat.openai.com/chat) and plugging them into SD, for some zero-effort creativity.
|
||||
As always, if anything's out of place of if you'd like to get in touch, please [send me an email!](mailto:me@ktyl.dev).
|
||||
|
||||
|
Loading…
Reference in New Issue