arxiv:2509.20427

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Published on Sep 24, 2025

· Submitted by

wujie10558@gmail.com on Sep 26, 2025

Upvote

Authors:

Lixue Gong ,

Huafeng Kuang ,

Yanzuo Lu ,

Abstract

Seedream 4.0 is a high-performance multimodal image generation system that integrates text-to-image synthesis, image editing, and multi-image composition using a diffusion transformer and VAE, achieving state-of-the-art results with efficient training and inference.

AI-generated summary

We introduce Seedream 4.0, an efficient and high-performance multimodal image generation system that unifies text-to-image (T2I) synthesis, image editing, and multi-image composition within a single framework. We develop a highly efficient diffusion transformer with a powerful VAE which also can reduce the number of image tokens considerably. This allows for efficient training of our model, and enables it to fast generate native high-resolution images (e.g., 1K-4K). Seedream 4.0 is pretrained on billions of text-image pairs spanning diverse taxonomies and knowledge-centric concepts. Comprehensive data collection across hundreds of vertical scenarios, coupled with optimized strategies, ensures stable and large-scale training, with strong generalization. By incorporating a carefully fine-tuned VLM model, we perform multi-modal post-training for training both T2I and image editing tasks jointly. For inference acceleration, we integrate adversarial distillation, distribution matching, and quantization, as well as speculative decoding. It achieves an inference time of up to 1.8 seconds for generating a 2K image (without a LLM/VLM as PE model). Comprehensive evaluations reveal that Seedream 4.0 can achieve state-of-the-art results on both T2I and multimodal image editing. In particular, it demonstrates exceptional multimodal capabilities in complex tasks, including precise image editing and in-context reasoning, and also allows for multi-image reference, and can generate multiple output images. This extends traditional T2I systems into an more interactive and multidimensional creative tool, pushing the boundary of generative AI for both creativity and professional applications. Seedream 4.0 is now accessible on https://www.volcengine.com/experience/ark?launch=seedream.

View arXiv page View PDF Project page Add to collection

Community

wujie10

Paper author Paper submitter Sep 26, 2025

Seedream 4.0 Technical Report

ajuarj

Sep 26, 2025

Ayar2001

Sep 26, 2025

Create a cat rising sun.

pankajis95

Sep 26, 2025

car

Tunimi

Sep 26, 2025

Create a white cat on stack and bitcoin

librarian-bot

Sep 27, 2025

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

ChengyouJia

Sep 28, 2025

•

edited Sep 28, 2025

Hi, I recently noticed the release of ByteDance’s Seedream 4.0, which is an impressive work. I am particularly interested in the multi-image ouput capability. In our recent paper, "Why Settle for One? Text-to-ImageSet Generation and Evaluation"(https://arxiv.org/abs/2506.23275), we propose the more challenging task of Text-to-ImageSet (T2IS) generation, which aims to create coherent image sets under diverse consistency requirements. To systematically study this problem, we introduced T2IS-Bench (596 diverse instructions across 26 subcategories) and T2IS-Eval, an evaluation framework for multifaceted set-level consistency assessment. Given the overlap, our benchmark and evaluation framework seem particularly suitable for assessing multi-image input and composite editing performance in Seedream 4.0. I wonder if your team has noticed our work, and whether you would be interested in extending experiments in this direction. I would be very happy to see potential collaboration on this topic. My email: cp3jia@stu.xjtu.edu.cn.

jeo123

Oct 15, 2025

•

edited Oct 15, 2025

Spent 5 mins on Seedream 4.0—my freelance social workflow’s changed, no cap. Used to waste 2hrs fixing generic AI graphics… now “boho candle posts” gets 6 4K options. No more color tweaks. AI design feeling like a guess? Try: https://www.seedream-4.net/

Yara12234

Nov 3, 2025

•

edited Nov 3, 2025

No description provided.

1234ZAMAN

Nov 7, 2025

I wish that it was open source

deleted

Nov 9, 2025

This comment has been hidden

deleted

Nov 11, 2025

This comment has been hidden

deleted

Nov 29, 2025

This comment has been hidden

1234ZAMAN

Dec 5, 2025

Now because of Seedream 4.0 is opensource 🥳 I shaped internet with my message

deleted

Dec 17, 2025

This comment has been hidden

Brice94

Dec 17, 2025

awesome fast, goo job
image color changer

deleted

Feb 28

This comment has been hidden

deleted

about 1 month ago

This comment has been hidden

mochicheng

Apr 5

This comment has been hidden (marked as Spam)

deleted

29 days ago

This comment has been hidden

alex201212

Apr 10

•

edited Apr 11

Seedream 4.0 looks incredibly impressive — the multimodal approach to image generation is clearly a step forward, and it's exciting to see the field pushing in this direction!
Honestly, we're living in a golden age of AI image generation right now. Seedream 4.0, Google's Imagen, and GPT Image 2 are all raising the bar in different ways. What I appreciate about GPT Image 2 in particular is how well it handles text rendering inside images — something most tools still struggle with. Great time to be a creator!

deleted

29 days ago

This comment has been hidden

deleted

Apr 14

This comment has been hidden

deleted

29 days ago

This comment has been hidden

deleted

29 days ago

This comment has been hidden

deleted

29 days ago

This comment has been hidden

deleted

29 days ago

This comment has been hidden

ai-shu

28 days ago

Loved reading this! You made some really good points. I’ve been building something related too, you can check it out at https://happy-horse.pro/.

ai-shu

28 days ago

Nice post! Super interesting read. If you’d like, feel free to check out my related project at https://cdance.net/.

ai-shu

27 days ago

This is a really insightful article. If you’re interested in AI image tools, you might also like https://gptimg2.art, which helps generate images from text prompts easily and quickly.

ai-shu

27 days ago

This is a really insightful article. If you’re interested in AI image tools, you might also like https://gptimg2.art, which helps generate images from text prompts easily and quickly.

mochicheng

22 days ago

This error is so frustrating! I had to ask my admin to update the policy. By the way, if you ever want a fun way to visualize your name, check out Your Name in Landsat – it turns names into satellite image letters.

Williamdd33

8 days ago

If you’re interested in AI image tools, you might also like SVGGenerator.org,
an AI-powered SVG generator that helps you create vector graphics, icons, logos, and illustrations from text prompts quickly and easily.

dkphhh

7 days ago

Generating a 2K image in just 1.8 seconds is impressive, especially when I compare it to the hassle of using Video to Text for my meeting notes in a crowded cafe. It makes me wonder if I should switch my workflow to this unified system for my daily creative tasks.

dkphhh

7 days ago

•

edited 7 days ago

I was surprised to see Seedream 4.0 generate 2K images in just 1.8 seconds while scrolling through my feed, and honestly, the multi-image output is quite impressive for such complex tasks. It makes me wish I could just Read PDF Aloud the full report during my coffee break to catch every technical detail without staring at the screen.

mochicheng

6 days ago

This list is awesome! I love seeing all the creative projects. Speaking of creativity, I've been using living the grid to make custom pixel art for my Tomodachi Life game. It's so fun!

gptimage2prompts

3 days ago

The fall-themed designs you mentioned sound lovely, and I can see how sharing the process on Instagram helps build a community around watercolor. For anyone wanting to experiment with different visual styles digitally, gpt image 2 prompts offers another way to explore creative workflows beyond traditional painting.