Modular Flux Upscale
Tiled image upscaling for Flux using MultiDiffusion latent-space blending. Produces seamless upscaled output without tile boundary artifacts.
Built with Modular Diffusers, composing reusable Flux blocks into a tiled upscaling workflow with optional ControlNet conditioning.
Install
pip install git+https://github.com/huggingface/diffusers.git transformers accelerate safetensors sentencepiece protobuf
Requires diffusers from main (modular diffusers support).
Quick start
from diffusers import ModularPipelineBlocks
from diffusers.models.controlnets.controlnet_flux import FluxControlNetModel
import torch
blocks = ModularPipelineBlocks.from_pretrained(
"akshan-main/modular-flux-upscale",
trust_remote_code=True,
)
pipe = blocks.init_pipeline("black-forest-labs/FLUX.1-dev")
pipe.load_components(torch_dtype=torch.bfloat16)
controlnet = FluxControlNetModel.from_pretrained(
"jasperai/Flux.1-dev-Controlnet-Upscaler", torch_dtype=torch.bfloat16
)
pipe.update_components(controlnet=controlnet)
pipe.to("cuda")
image = ... # your PIL image
result = pipe(
prompt="high quality, detailed, sharp",
image=image,
control_image=image,
controlnet_conditioning_scale=1.0,
upscale_factor=2.0,
num_inference_steps=28,
generator=torch.Generator("cuda").manual_seed(42),
output="images",
)
result[0].save("upscaled.png")
How it works
- Input image is upscaled to the target resolution using Lanczos interpolation
- Upscaled image is encoded to latent space via the Flux VAE
- Noise is added to the latents based on
strength - Latents are packed into sequence format for the Flux transformer
- At each denoising timestep, the transformer runs on overlapping latent tiles with RoPE-aware position IDs. Noise predictions from all tiles are blended using boundary-aware cosine weights (MultiDiffusion)
- One scheduler step is taken on the full blended prediction
- After all timesteps, denoised latents are unpacked and decoded back to pixel space
- For large upscale factors with
progressive=True, steps 1-7 repeat as multiple passes
ControlNet is optional but recommended. Without it, the model can hallucinate new content instead of enhancing existing detail.
Examples
2x upscale with ControlNet Upscaler
result = pipe(
prompt="high quality, detailed, sharp",
image=image,
control_image=image,
controlnet_conditioning_scale=1.0,
upscale_factor=2.0,
num_inference_steps=28,
generator=torch.Generator("cuda").manual_seed(42),
output="images",
)
Progressive upscale
Automatically splits into multiple passes. Auto-strength scales denoise strength per pass.
result = pipe(
prompt="high quality, detailed, sharp",
image=image,
control_image=image,
controlnet_conditioning_scale=1.0,
upscale_factor=4.0,
progressive=True,
generator=torch.Generator("cuda").manual_seed(42),
output="images",
)
To disable progressive mode:
result = pipe(..., upscale_factor=4.0, progressive=False, strength=0.2)
Without ControlNet
For cases where you want the model to add creative detail. Use lower strength.
result = pipe(
prompt="high quality, detailed, sharp",
image=image,
upscale_factor=2.0,
strength=0.25,
auto_strength=False,
num_inference_steps=28,
output="images",
)
Parameters
| Parameter | Default | Description |
|---|---|---|
image |
required | Input image (PIL) |
prompt |
"" |
Text prompt |
upscale_factor |
2.0 |
Scale multiplier |
strength |
0.35 |
Denoise strength. Lower = closer to input. Ignored when auto_strength=True |
num_inference_steps |
28 |
Denoising steps |
guidance_scale |
3.5 |
Flux guidance embedding scale |
latent_tile_size |
64 |
Tile size in latent pixels (64 = 512px) |
latent_overlap |
16 |
Tile overlap in latent pixels (16 = 128px) |
control_image |
None |
ControlNet conditioning image |
controlnet_conditioning_scale |
1.0 |
ControlNet strength |
progressive |
True |
Split large upscale factors into multiple passes |
auto_strength |
True |
Auto-scale strength based on upscale factor and pass index |
generator |
None |
Torch generator for reproducibility |
output |
"images" |
Output key |
Tuning guide
strength - how much the model changes the image.
- 0.15-0.25: minimal changes, mostly sharpening
- 0.25-0.35: balanced enhancement (default with auto_strength)
- 0.4+: significant changes, risk of drift
latent_tile_size - tile size for MultiDiffusion.
- 64 (512px): works on most GPUs. Recommended
- 96 (768px): smoother output
- Below 64: may produce artifacts due to insufficient context
controlnet_conditioning_scale - ControlNet influence.
- 1.0: very faithful to input. Recommended
- 0.7-0.8: slight creative freedom
- Below 0.5: too weak, causes hallucination
guidance_scale - Flux guidance embedding strength.
- 2-3: softer, more natural
- 3.5: standard
- 5+: more contrast
Limitations
latent_tile_sizebelow 64 may produce artifacts- Very small inputs produce distortion. Use progressive mode
- ControlNet is optional but recommended for faithful upscaling
- FLUX.1-dev is a gated model - accept the license at https://huggingface.co/black-forest-labs/FLUX.1-dev
- Flux does not use negative prompts
- Not suitable for upscaling text, line art, or pixel art
Architecture
FluxUpscaleMultiDiffusionBlocks (SequentialPipelineBlocks)
text_encoder Flux TextEncoderStep (CLIP + T5, reused)
upscale Lanczos upscale step
input Flux InputStep (reused)
set_timesteps Flux SetTimestepsStep (reused)
multidiffusion MultiDiffusion step
- VAE encode full image
- Per timestep: transformer on each packed tile, cosine-weighted blend
- VAE decode full latents
Models
- Base: black-forest-labs/FLUX.1-dev
- ControlNet (optional): jasperai/Flux.1-dev-Controlnet-Upscaler
References
- MultiDiffusion (Bar-Tal et al., 2023) - tiled latent-space blending algorithm
- Modular Diffusers - the HuggingFace framework this pipeline is built on
- Modular Diffusers contribution call
- ControlNet Upscaler - upscaling-specific ControlNet for Flux
Tested on
- Google Colab A100 (bfloat16)
- 2x: 512x512 to 1024x1024
- 4x progressive: 256x256 to 1024x1024
- Downloads last month
- -