Modular Flux Upscale

Tiled image upscaling for Flux using MultiDiffusion latent-space blending. Produces seamless upscaled output without tile boundary artifacts.

Built with Modular Diffusers, composing reusable Flux blocks into a tiled upscaling workflow with optional ControlNet conditioning.

Install

pip install git+https://github.com/huggingface/diffusers.git transformers accelerate safetensors sentencepiece protobuf

Requires diffusers from main (modular diffusers support).

Quick start

from diffusers import ModularPipelineBlocks
from diffusers.models.controlnets.controlnet_flux import FluxControlNetModel
import torch

blocks = ModularPipelineBlocks.from_pretrained(
    "akshan-main/modular-flux-upscale",
    trust_remote_code=True,
)

pipe = blocks.init_pipeline("black-forest-labs/FLUX.1-dev")
pipe.load_components(torch_dtype=torch.bfloat16)

controlnet = FluxControlNetModel.from_pretrained(
    "jasperai/Flux.1-dev-Controlnet-Upscaler", torch_dtype=torch.bfloat16
)
pipe.update_components(controlnet=controlnet)
pipe.to("cuda")

image = ...  # your PIL image

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    control_image=image,
    controlnet_conditioning_scale=1.0,
    upscale_factor=2.0,
    num_inference_steps=28,
    generator=torch.Generator("cuda").manual_seed(42),
    output="images",
)
result[0].save("upscaled.png")

How it works

Input image is upscaled to the target resolution using Lanczos interpolation
Upscaled image is encoded to latent space via the Flux VAE
Noise is added to the latents based on strength
Latents are packed into sequence format for the Flux transformer
At each denoising timestep, the transformer runs on overlapping latent tiles with RoPE-aware position IDs. Noise predictions from all tiles are blended using boundary-aware cosine weights (MultiDiffusion)
One scheduler step is taken on the full blended prediction
After all timesteps, denoised latents are unpacked and decoded back to pixel space
For large upscale factors with progressive=True, steps 1-7 repeat as multiple passes

ControlNet is optional but recommended. Without it, the model can hallucinate new content instead of enhancing existing detail.

Examples

2x upscale with ControlNet Upscaler

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    control_image=image,
    controlnet_conditioning_scale=1.0,
    upscale_factor=2.0,
    num_inference_steps=28,
    generator=torch.Generator("cuda").manual_seed(42),
    output="images",
)

Progressive upscale

Automatically splits into multiple passes. Auto-strength scales denoise strength per pass.

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    control_image=image,
    controlnet_conditioning_scale=1.0,
    upscale_factor=4.0,
    progressive=True,
    generator=torch.Generator("cuda").manual_seed(42),
    output="images",
)

To disable progressive mode:

result = pipe(..., upscale_factor=4.0, progressive=False, strength=0.2)

Without ControlNet

For cases where you want the model to add creative detail. Use lower strength.

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    upscale_factor=2.0,
    strength=0.25,
    auto_strength=False,
    num_inference_steps=28,
    output="images",
)

Parameters

Parameter	Default	Description
`image`	required	Input image (PIL)
`prompt`	`""`	Text prompt
`upscale_factor`	`2.0`	Scale multiplier
`strength`	`0.35`	Denoise strength. Lower = closer to input. Ignored when `auto_strength=True`
`num_inference_steps`	`28`	Denoising steps
`guidance_scale`	`3.5`	Flux guidance embedding scale
`latent_tile_size`	`64`	Tile size in latent pixels (64 = 512px)
`latent_overlap`	`16`	Tile overlap in latent pixels (16 = 128px)
`control_image`	`None`	ControlNet conditioning image
`controlnet_conditioning_scale`	`1.0`	ControlNet strength
`progressive`	`True`	Split large upscale factors into multiple passes
`auto_strength`	`True`	Auto-scale strength based on upscale factor and pass index
`generator`	`None`	Torch generator for reproducibility
`output`	`"images"`	Output key

Tuning guide

strength - how much the model changes the image.

0.15-0.25: minimal changes, mostly sharpening
0.25-0.35: balanced enhancement (default with auto_strength)
0.4+: significant changes, risk of drift

latent_tile_size - tile size for MultiDiffusion.

64 (512px): works on most GPUs. Recommended
96 (768px): smoother output
Below 64: may produce artifacts due to insufficient context

controlnet_conditioning_scale - ControlNet influence.

1.0: very faithful to input. Recommended
0.7-0.8: slight creative freedom
Below 0.5: too weak, causes hallucination

guidance_scale - Flux guidance embedding strength.

2-3: softer, more natural
3.5: standard
5+: more contrast

Limitations

latent_tile_size below 64 may produce artifacts
Very small inputs produce distortion. Use progressive mode
ControlNet is optional but recommended for faithful upscaling
FLUX.1-dev is a gated model - accept the license at https://huggingface.co/black-forest-labs/FLUX.1-dev
Flux does not use negative prompts
Not suitable for upscaling text, line art, or pixel art

Architecture

FluxUpscaleMultiDiffusionBlocks (SequentialPipelineBlocks)
  text_encoder      Flux TextEncoderStep (CLIP + T5, reused)
  upscale           Lanczos upscale step
  input             Flux InputStep (reused)
  set_timesteps     Flux SetTimestepsStep (reused)
  multidiffusion    MultiDiffusion step
                    - VAE encode full image
                    - Per timestep: transformer on each packed tile, cosine-weighted blend
                    - VAE decode full latents

Models

Base: black-forest-labs/FLUX.1-dev
ControlNet (optional): jasperai/Flux.1-dev-Controlnet-Upscaler

References

MultiDiffusion (Bar-Tal et al., 2023) - tiled latent-space blending algorithm
Modular Diffusers - the HuggingFace framework this pipeline is built on
Modular Diffusers contribution call
ControlNet Upscaler - upscaling-specific ControlNet for Flux

Tested on

Google Colab A100 (bfloat16)
2x: 512x512 to 1024x1024
4x progressive: 256x256 to 1024x1024

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for akshan-main/modular-flux-upscale

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Paper • 2302.08113 • Published Feb 16, 2023 • 1