video
updated
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
Paper
• 2512.24271
• Published • 64
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
Paper
• 2512.24724
• Published • 9
Pretraining Frame Preservation in Autoregressive Video Memory Compression
Paper
• 2512.23851
• Published • 25
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation
Paper
• 2512.24551
• Published • 21
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
Paper
• 2512.22905
• Published • 20
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
Paper
• 2512.24385
• Published • 8
Factorized Learning for Temporally Grounded Video-Language Models
Paper
• 2512.24097
• Published • 7
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling
Paper
• 2512.23162
• Published • 14
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web
Paper
• 2512.23044
• Published • 10
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation
Paper
• 2512.21734
• Published • 5
GLM-5: from Vibe Coding to Agentic Engineering
Paper
• 2602.15763
• Published • 121