-
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Paper • 2603.17024 • Published • 106 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 12 -
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
Paper • 2603.25319 • Published • 32 -
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
Paper • 2603.25791 • Published • 3
ZhengQi Wan
Vanqi
·
AI & ML interests
None yet
Recent Activity
updated a collection about 10 hours ago
From Vision to Motion updated a collection 3 days ago
From Vision to Motion upvoted a paper 3 days ago
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World ModelsOrganizations
None yet