On August 22, 2022, the London startup Stability AI released the weight file for an image generation model called Stable Diffusion v1.4. A single 4GB `.ckpt` file. The moment it hit GitHub and Hugging Face, "image generation AI" went from something behind the cloud to software you could download to your own PC. Neither Midjourney nor DALL·E 2 would do that at the time.

Almost four years on, Stable Diffusion has reached SD 3.5 Large (8.1 billion parameters), and Civitai hosts over 100,000 custom models and LoRAs. Meanwhile, the licensing blowback around SD3's release caused a developer exodus, which gave birth to FLUX — built by the original SD team's new company, Black Forest Labs — and FLUX has overtaken the parent in quality. The picture is no longer simple.

My stance up front. If "Midjourney is fine" works for you, don't force yourself into Stable Diffusion. But if any of these apply — "I want to keep the same character consistent across 100 images," "I want to mix in my own confidential data locally," "I want my monthly cost to be $0," "I need an open model I can disclose for commercial work" — then SD is unavoidable. This article covers how SD works, its version history, hardware requirements, licensing, ecosystem, and how to choose, all as of May 2026.

Stable Diffusion · Open-Source Image AI

Four Things That Make It Different

— What Midjourney, DALL·E, and Firefly will never give you

① OPEN WEIGHTS
Weight files are distributed
Download .safetensors directly from Hugging Face. Midjourney doesn't even expose an API
② LOCAL FIRST
Runs on your own GPU
Practical from RTX 3060 (12GB) up. Generated data stays on your machine
③ FINE-TUNE
Modify freely with LoRA
100,000+ LoRAs and custom models on Civitai — anime, photoreal, specific characters, anything
④ ZERO COST
Free beyond electricity
After the upfront GPU, every image is $0. Commercial use is also OK with conditions

In other words, this is the image AI for people who want freedom from cloud dependence, black boxes, and monthly subscriptions.
The price you pay in return: a GPU, setup time, and prompt trial-and-error.

1. August 22, 2022 — The Day Image AI Became Something You Could Download

At the time, the image generation AI scene was a two-horse race: OpenAI's DALL·E 2 (invite-only beta) and Midjourney V3 (Discord-only). Both were cloud-only, and both kept their weights completely hidden. What their AI learned, how it ran, what it could and couldn't generate — all of it was at the vendor's discretion.

Then Stability AI made a choice nobody expected: release the weight file itself. A diffusion model trained on LAION-5B (5.8 billion image-text pairs), inference code under MIT, weights under CreativeML Open RAIL-M (commercial use OK, almost completely free). Within a week, engineers worldwide had it running in Google Colab, a local WebUI (later AUTOMATIC1111) was born, Civitai launched — and the personalization of AI art took off.

The remarkable thing wasn't the technical leap so much as the precedent: "image generation AI is something individuals can own and modify." If you want an LLM analogy, the shock was close to Llama 2 and Llama 3 dropping with "commercial use OK." Ever since, the image AI industry has run two parallel tracks: "closed and high quality" (MJ/DALL·E) and "open and freely customizable" (the SD family).

2. What Is Stable Diffusion — In Three Lines

Stable Diffusion is an open-weight, diffusion-model-based image generation AI released by Stability AI. Three-line breakdown:

① HOW IT WORKS
Starts from a random noise image, then gradually denoises it to match your text prompt. Takes 20–50 steps
② ARCHITECTURE
A three-part stack: Text Encoder (CLIP/T5) that interprets the prompt, U-Net/DiT that does the denoising, and a VAE that compresses/decompresses the image
③ DISTRIBUTION
Weight files (.safetensors, 2GB–16GB) are freely downloadable from Hugging Face. Run them on a local GPU or via cloud inference services

The thing I think actually matters is what "diffusion model" means in plain terms. In the GAN era (StyleGAN and friends), a generator and a discriminator fought each other to produce images. Diffusion models took a different path: "start from a noisy image and gradually subtract noise." A simpler idea — but it turned out to produce far more stable, high-resolution output than GANs. That insight is the core of SD's success, and almost every image AI since (Imagen, DALL·E 3, FLUX) is also a diffusion model.

3. Version Lineage — SD1.5 / SDXL / SD3.5 and the FLUX Split

The most confusing thing about SD's history is "which version should I actually use?" Each generation differs in performance, license, recommended GPU, and LoRA ecosystem. Let's lay it out.

Version Released Parameters Recommended VRAM Characteristics
SD 1.5 Oct 2022 0.9B 4–8GB Lightest, most LoRAs, strongest on anime. Still mainstream on Civitai
SD 2.x Nov 2022 0.9B 6–8GB Effectively skip. Reduced training data, poor reception, never caught on
SDXL 1.0 Jul 2023 3.5B 8–12GB 1024×1024 standard. The go-to for photoreal and commercial design. Second-largest LoRA pool
SD 3 Medium Jun 2024 2B 8–12GB License blowback caused developer exodus. Widely seen as a failure
SD 3.5 Medium Oct 2024 2.5B 9.9GB Redemption for SD3. MMDiT-X architecture, designed for consumer PCs
SD 3.5 Large Oct 2024 8.1B 18GB (11GB in FP8) The flagship quality. Aimed at RTX 4090 class
FLUX.1 dev Aug 2024 12B 12–24GB From Black Forest Labs, founded by ex-SD developers. Widely rated above SD itself

Bottom line: if you're starting today, it's a two-way pick between SDXL and FLUX.1 dev. SD 1.5 is light and has the most LoRAs, but it's a generation behind on quality. SD 3.5 Large is heavy yet pushed around by FLUX. The practical sorting is: SDXL for commercial design, FLUX for top quality, SD 3.5 Medium for the lightest viable local setup.

FLUX's arrival has an ironic backstory. After the SD3 licensing fiasco (more below), much of the original SD team left Stability AI, set up Black Forest Labs in Germany, and launched FLUX.1. "A higher-quality SD successor" — coming from the people who built SD in the first place. From the community's perspective, plenty of people now see FLUX as the rightful heir rather than the parent.

4. The Reality of Running It Locally — By VRAM Tier

"Runs locally" is one thing; what your specific PC can actually do is another. Here's what I've seen in practice.

4–6GB (GTX 1660 / RTX 3050)
Barely-works tier
SD 1.5 only. 20–60 sec per image. SDXL and above are rough
8GB (RTX 3060 Ti / 4060)
Minimum practical line
SDXL runs with memory optimization. 15–30 sec per 1024px image
12GB (RTX 3060 12GB / 4070)
Comfortable tier
SDXL/SD 3.5 Medium with headroom. Stack LoRAs freely. 5–15 sec per image
16–24GB (RTX 4080 / 4090)
Serious production setup
FLUX/SD 3.5 Large with headroom. You can train your own LoRAs. 2–8 sec per image

Note: 16GB+ system RAM and 100GB+ of free SSD space are also needed. Mac runs via Apple Silicon's MPS but is 3–5× slower than NVIDIA

No sugarcoating: if you want to seriously touch SD today, the realistic entry points are an RTX 3060 12GB (around $200 used) or an RTX 4070 (around $600 new). 8GB GPUs work, but you're walking into a swamp of optimization flags and quantization — not what I'd recommend to a beginner. If you don't want to buy a GPU, the right move is cloud inference services (Runpod / Replicate / Civitai's own hosting) at roughly $0.001–$0.01 per image.

5. The License Trap — Lessons from the SD3 Backlash

"It's open source, so commercial use is fine" is not the simple statement people want it to be with SD. The license depends on the version.

SD 1.5 / SDXL
CreativeML Open RAIL-M
No revenue cap. Commercial use is almost entirely free. Only restrictions concern illegal or harmful use
SD 3 / SD 3.5 / FLUX.1 dev
Community License (with $1M revenue cap)
Individuals and organizations under $1M in annual revenue can use it commercially. Above that, an Enterprise contract is required

Individual bloggers, freelancers, and early-stage startups are all clear. A commercial agreement is only needed when a large enterprise embeds it in a product. Selling the generated images themselves is unlimited — no matter how many you generate or sell, you owe Stability AI nothing

When SD 3 dropped in June 2024, its license was so harsh — usage-based fees per generated image, a ban on Civitai distribution of derivatives — that Civitai publicly refused to host SD3 derivatives. The community declared "SD is dead," many developers walked to Black Forest Labs and shipped FLUX. Stability AI massively loosened the terms when SD 3.5 launched in October (the current $1M revenue version), but as of May 2026, community trust has not fully recovered.

Practical advice: "Just use SDXL" is the version that bites least. CreativeML Open RAIL-M means no revenue cap, the LoRA pool is huge, and the ecosystem is mature. Move to SD 3.5 or FLUX only when SDXL stops being enough.

6. Civitai / LoRA / ComfyUI — An Ecosystem Bigger Than the Model

Talking about Stable Diffusion as "just the model" misses the point. SD's strength is the surrounding ecosystem.

Civitai
Model distribution hub
100,000+ checkpoints, LoRAs, embeddings. Anime, photoreal, specific characters, specific poses — anything
LoRA
Add-on training file
Small 50–300MB files that add a style or character to a base model. Stack them to combine effects
ComfyUI
Node-based UI
The pro's choice. Build complex workflows visually (ControlNet → upscale → Inpaint chains, etc.)
A1111
Beginner-friendly WebUI
AUTOMATIC1111's project. Form-based and intuitive. How most SD users first got in
ControlNet
Composition control
Specify composition with a pose image, line drawing, or depth map. Midjourney has no equivalent at this precision
IP-Adapter
Image reference
Copy a reference image's style, face, or outfit onto a new image. Essential for character consistency

One caveat. SD 1.5 LoRAs don't load on SDXL; SDXL LoRAs don't load on FLUX. Each base model is its own ecosystem. If the LoRAs you love on Civitai are all SD 1.5, switching to SDXL means abandoning them. When searching on Civitai, always check the "Base Model" filter.

7. Midjourney vs Stable Diffusion — Which to Pick

People often ask "which is better, SD or Midjourney/DALL·E?" — but that's the wrong axis. Go with Midjourney for quality, go with SD for freedom and ownership. Different roles entirely.

Aspect Midjourney V8 Stable Diffusion (SDXL/FLUX)
Ease of use ◎ Just write the prompt △ Setup required
Default quality ◎ Best artistic look in the industry ○ Depends on model (FLUX is on par)
Composition control △ Prompt only ◎ Full control via ControlNet
Character consistency ○ Character Reference ◎ Train a LoRA, replicate perfectly
Monthly cost $10–$120 $0 (local) or pay-per-use
Commercial use OK on paid plans SDXL unlimited; SD3.5/FLUX has $1M cap
Data privacy × Cloud-bound ◎ Can stay local end-to-end
Learning curve Hours Days to weeks

The clean read: for "make a single pretty image," Midjourney. $10/month and no setup hell. For "I want 100 images of the same character," "I want to mix in proprietary data," "I want a commercial flat-rate at any volume," or "I want to reproduce a specific anime style," Stable Diffusion. Neither is "better." Plenty of pros use both (an illustrator I know roughs out composition in MJ and finishes in SD).

8. Three Pitfalls — Copyright, NSFW, Compatibility

Three things you'll hit using SD that are worth knowing up front.

Pitfall ①: Training-data copyright risk

SD's base models are trained on LAION-5B (5.8 billion images scraped from the internet). Inevitably, copyrighted works are in there in large numbers. Getty Images is currently suing Stability AI (filed 2023, ongoing in both US and UK), and "specific artist style" LoRAs on Civitai have gotten visibly greyer since 2025. For commercial work, minimum hygiene: don't prompt by specific artist names, and even on Civitai LoRAs, avoid public figures or works modeled on identifiable copyright holders. If "commercial safety" is non-negotiable, Adobe Firefly is the alternative.

Pitfall ②: NSFW generation is trivially easy

Because SD has open weights, disabling the SafetyChecker means sexual or violent images are easy to generate. Civitai openly hosts many NSFW models. The technology itself is neutral, but creation or distribution of generated content involving minors is illegal in many countries (Japan currently has legislation under discussion). Never do this on a work PC during work hours — logs and network traffic make it trivial to spot. Even on a home PC, certain categories are illegal to create or even store. Self-awareness is mandatory.

Pitfall ③: Generational compatibility splits

As covered above, SD1.5 / SDXL / SD3.5 / FLUX are each their own ecosystem. LoRAs, embeddings, and ControlNet models don't cross-load. "Let me upgrade to SDXL" can mean discovering 50 SD1.5 LoRAs you can't use anymore. If you're starting out, pick one (SDXL or FLUX) and stay within that ecosystem — it's actually more efficient in the long run.

Summary

Essence
The revolution that turned image AI into "software individuals can own and modify." Provides freedoms MJ/DALL·E don't
Entry point
RTX 3060 12GB + SDXL + A1111 is the realistic start. No GPU? Use Runpod from $0.001/image
Use which
Most people: Midjourney. Choose SD only if you need "100 of the same character," "private data," or "electricity-only costs"
Caution
Copyright, NSFW, and compatibility splits are the three things to know early. Start commercial work on SDXL (no revenue cap)

Stable Diffusion changed the world in 2022. But in 2026, "just use SD" is no longer the default answer — Midjourney V8 wins on raw quality, Adobe Firefly wins on commercial safety. The reason SD hasn't died — and in fact has gained momentum with FLUX — is that it remains the only option for "use image AI on your own PC, with your own data, exactly the way you want, without depending on any cloud company." Midjourney can lock you out of Discord; OpenAI can change its terms of service; the SD weight file on your SSD is yours. For people who feel safer that way, SD will keep being a special tool.

FAQ

Is Stable Diffusion free?

The model itself (weight files) is free to download and use. You do need a GPU to run it — at minimum an RTX 3060 12GB (around $200) — or a cloud inference service (Runpod runs about $0.4/hour). You owe Stability AI no monthly fee.

Can I use it commercially?

Depends on the version. SD 1.5 and SDXL are fully open (CreativeML Open RAIL-M, no revenue cap). SD 3, SD 3.5, and FLUX.1 dev are free for commercial use under $1M in annual revenue; above that you need a contract with Stability AI or Black Forest Labs. Selling the generated images themselves is unlimited on all versions.

Which is better, Midjourney or SD?

Depends on use. If you just want one pretty image from a prompt, Midjourney is far simpler and the quality is excellent. If you need to mass-produce the same character, mix in proprietary data, drive cost down to electricity, or replicate a specific anime style, only Stable Diffusion works. Plenty of pros use both.

Which version should I start with?

SDXL 1.0 is the safest start today. Runs in 8–12GB VRAM, has a huge LoRA library on Civitai, has no commercial revenue cap, and the ecosystem is mature. For top quality go to FLUX.1 dev (recommended 16GB+ VRAM). SD 1.5 is light but a generation behind on quality — likely to leave new users wanting more.

Is FLUX a different thing from Stable Diffusion?

Technically related but from a different company. FLUX is from Black Forest Labs, founded by ex-Stability-AI engineers who built SD. It's positioned less as a successor and more as "a higher-quality open image AI." The ecosystems are separate (FLUX LoRAs don't work in SD). But in the "open-weight, locally runnable image AI" category they're the same camp, and both are first-class citizens on Civitai and ComfyUI.

Should I buy a GPU or rent cloud?

Cloud (Runpod / Replicate / Civitai's on-demand) is cheaper if you generate fewer than 50 images a month. Around $0.001–$0.01 per image. If you generate hundreds per month, train your own LoRAs, or refuse to send data off your machine, buying a GPU pays for itself. The cost-effective sweet spot for serious users is a used RTX 3090 (24GB, around $500).