Adam Leary's picture

Adam Leary

beatgeek

·

beatgeek

AI & ML interests

Gen AI LLMs Multimodal Speech

Organizations

upvoted an article 4 months ago

Article

Differential Transformer V2

microsoft

•

Jan 20

• 51

upvoted a collection 11 months ago

VideoPrism

VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. • 5 items • Updated Mar 12 • 19

upvoted an article about 1 year ago

Article

Vision Language Models (Better, faster, stronger)

+3

merve, sergiopaniego, ariG23498, pcuenq, andito

•

May 12, 2025

• 613

upvoted a paper over 1 year ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7, 2025 • 49

upvoted an article over 1 year ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

+1

merve, andsteing, pcuenq

•

May 14, 2024

• 287

upvoted a collection over 1 year ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 16 items • Updated Dec 24, 2025 • 243

upvoted a paper over 2 years ago

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

Paper • 2212.10445 • Published Dec 20, 2022 • 2