VideoPrism Collection VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. • 5 items • Updated Mar 12 • 19
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 613
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7, 2025 • 49
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model +1 merve, andsteing, pcuenq • May 14, 2024 • 287
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 16 items • Updated Dec 24, 2025 • 243
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization Paper • 2212.10445 • Published Dec 20, 2022 • 2