MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation Paper ⢠2511.22989 ⢠Published Nov 28, 2025 ⢠17
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper ⢠2508.05629 ⢠Published Aug 7, 2025 ⢠189
view article Article RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Alibaba-DAMO-Academy ⢠Aug 11, 2025 ⢠28
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. ⢠41 items ⢠Updated Mar 2 ⢠152
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper ⢠2503.16365 ⢠Published Mar 20, 2025 ⢠41
OpenX-LeRobot Collection Open X-Embodiment datasets in LeRobot format with standard transfomation (https://github.com/Tavish9/any4lerobot) ⢠32 items ⢠Updated Mar 2 ⢠36
Magma: A Foundation Model for Multimodal AI Agents Paper ⢠2502.13130 ⢠Published Feb 18, 2025 ⢠58
view article Article Ļ0 and Ļ0-FAST: Vision-Language-Action Models for General Robot Control +2 danaaubakirova, Molbap, mshukor, cadene ⢠Feb 4, 2025 ⢠192
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution Paper ⢠2312.06640 ⢠Published Dec 11, 2023 ⢠49
Cosmos Collection ā ļø This collection is archived. š https://huggingface.co/collections/nvidia/nvidia-cosmos-2 ⢠14 items ⢠Updated 3 days ago ⢠302
PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining Paper ⢠2303.08789 ⢠Published Mar 15, 2023 ⢠2
Star Attention: Efficient LLM Inference over Long Sequences Paper ⢠2411.17116 ⢠Published Nov 26, 2024 ⢠53
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper ⢠2411.13807 ⢠Published Nov 21, 2024 ⢠11
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper ⢠2411.13543 ⢠Published Nov 20, 2024 ⢠19
How Far is Video Generation from World Model: A Physical Law Perspective Paper ⢠2411.02385 ⢠Published Nov 4, 2024 ⢠34
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper ⢠2411.04709 ⢠Published Nov 5, 2024 ⢠27
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper ⢠2411.07975 ⢠Published Nov 12, 2024 ⢠32