Modular Flux Upscale

Tiled image upscaling for Flux using MultiDiffusion latent-space blending. Produces seamless upscaled output without tile boundary artifacts.

Built with Modular Diffusers, composing reusable Flux blocks into a tiled upscaling workflow with optional ControlNet conditioning.

Open In Colab GitHub

Install

pip install git+https://github.com/huggingface/diffusers.git transformers accelerate safetensors sentencepiece protobuf

Requires diffusers from main (modular diffusers support).

Quick start

from diffusers import ModularPipelineBlocks
from diffusers.models.controlnets.controlnet_flux import FluxControlNetModel
import torch

blocks = ModularPipelineBlocks.from_pretrained(
    "akshan-main/modular-flux-upscale",
    trust_remote_code=True,
)

pipe = blocks.init_pipeline("black-forest-labs/FLUX.1-dev")
pipe.load_components(torch_dtype=torch.bfloat16)

controlnet = FluxControlNetModel.from_pretrained(
    "jasperai/Flux.1-dev-Controlnet-Upscaler", torch_dtype=torch.bfloat16
)
pipe.update_components(controlnet=controlnet)
pipe.to("cuda")

image = ...  # your PIL image

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    control_image=image,
    controlnet_conditioning_scale=1.0,
    upscale_factor=2.0,
    num_inference_steps=28,
    generator=torch.Generator("cuda").manual_seed(42),
    output="images",
)
result[0].save("upscaled.png")

How it works

  1. Input image is upscaled to the target resolution using Lanczos interpolation
  2. Upscaled image is encoded to latent space via the Flux VAE
  3. Noise is added to the latents based on strength
  4. Latents are packed into sequence format for the Flux transformer
  5. At each denoising timestep, the transformer runs on overlapping latent tiles with RoPE-aware position IDs. Noise predictions from all tiles are blended using boundary-aware cosine weights (MultiDiffusion)
  6. One scheduler step is taken on the full blended prediction
  7. After all timesteps, denoised latents are unpacked and decoded back to pixel space
  8. For large upscale factors with progressive=True, steps 1-7 repeat as multiple passes

ControlNet is optional but recommended. Without it, the model can hallucinate new content instead of enhancing existing detail.

Examples

2x upscale with ControlNet Upscaler

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    control_image=image,
    controlnet_conditioning_scale=1.0,
    upscale_factor=2.0,
    num_inference_steps=28,
    generator=torch.Generator("cuda").manual_seed(42),
    output="images",
)

Progressive upscale

Automatically splits into multiple passes. Auto-strength scales denoise strength per pass.

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    control_image=image,
    controlnet_conditioning_scale=1.0,
    upscale_factor=4.0,
    progressive=True,
    generator=torch.Generator("cuda").manual_seed(42),
    output="images",
)

To disable progressive mode:

result = pipe(..., upscale_factor=4.0, progressive=False, strength=0.2)

Without ControlNet

For cases where you want the model to add creative detail. Use lower strength.

result = pipe(
    prompt="high quality, detailed, sharp",
    image=image,
    upscale_factor=2.0,
    strength=0.25,
    auto_strength=False,
    num_inference_steps=28,
    output="images",
)

Parameters

Parameter Default Description
image required Input image (PIL)
prompt "" Text prompt
upscale_factor 2.0 Scale multiplier
strength 0.35 Denoise strength. Lower = closer to input. Ignored when auto_strength=True
num_inference_steps 28 Denoising steps
guidance_scale 3.5 Flux guidance embedding scale
latent_tile_size 64 Tile size in latent pixels (64 = 512px)
latent_overlap 16 Tile overlap in latent pixels (16 = 128px)
control_image None ControlNet conditioning image
controlnet_conditioning_scale 1.0 ControlNet strength
progressive True Split large upscale factors into multiple passes
auto_strength True Auto-scale strength based on upscale factor and pass index
generator None Torch generator for reproducibility
output "images" Output key

Tuning guide

strength - how much the model changes the image.

  • 0.15-0.25: minimal changes, mostly sharpening
  • 0.25-0.35: balanced enhancement (default with auto_strength)
  • 0.4+: significant changes, risk of drift

latent_tile_size - tile size for MultiDiffusion.

  • 64 (512px): works on most GPUs. Recommended
  • 96 (768px): smoother output
  • Below 64: may produce artifacts due to insufficient context

controlnet_conditioning_scale - ControlNet influence.

  • 1.0: very faithful to input. Recommended
  • 0.7-0.8: slight creative freedom
  • Below 0.5: too weak, causes hallucination

guidance_scale - Flux guidance embedding strength.

  • 2-3: softer, more natural
  • 3.5: standard
  • 5+: more contrast

Limitations

  • latent_tile_size below 64 may produce artifacts
  • Very small inputs produce distortion. Use progressive mode
  • ControlNet is optional but recommended for faithful upscaling
  • FLUX.1-dev is a gated model - accept the license at https://huggingface.co/black-forest-labs/FLUX.1-dev
  • Flux does not use negative prompts
  • Not suitable for upscaling text, line art, or pixel art

Architecture

FluxUpscaleMultiDiffusionBlocks (SequentialPipelineBlocks)
  text_encoder      Flux TextEncoderStep (CLIP + T5, reused)
  upscale           Lanczos upscale step
  input             Flux InputStep (reused)
  set_timesteps     Flux SetTimestepsStep (reused)
  multidiffusion    MultiDiffusion step
                    - VAE encode full image
                    - Per timestep: transformer on each packed tile, cosine-weighted blend
                    - VAE decode full latents

Models

References

Tested on

  • Google Colab A100 (bfloat16)
  • 2x: 512x512 to 1024x1024
  • 4x progressive: 256x256 to 1024x1024
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for akshan-main/modular-flux-upscale