Zhaoye Fei's picture

Zhaoye Fei

ngc7293

·

https://ngc7292.github.io/

AI & ML interests

NLP & Ro.

Recent Activity

authored a paper 9 days ago

MOSS-TTS Technical Report

authored a paper 9 days ago

World Action Models: The Next Frontier in Embodied AI

upvoted a paper 9 days ago

Robust Speech Recognition via Large-Scale Weak Supervision

View all activity

Organizations

upvoted a paper 9 days ago

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 54

upvoted 2 papers 10 days ago

MOSS-TTS Technical Report

Paper • 2603.18090 • Published Mar 18 • 13

World Action Models: The Next Frontier in Embodied AI

Paper • 2605.12090 • Published 11 days ago • 64

upvoted a paper 16 days ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 178

upvoted a collection 17 days ago

MOVA

3 items • Updated Apr 20 • 22

upvoted 2 collections about 1 month ago

MOSS-TTS

10 items • Updated Apr 20 • 32

MOSS-Audio

An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 7 items • Updated 21 days ago • 55

upvoted 4 papers 3 months ago

MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

Paper • 2602.10934 • Published Feb 11 • 49

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Paper • 2602.10090 • Published Feb 10 • 53

Prism: Spectral-Aware Block-Sparse Attention

Paper • 2602.08426 • Published Feb 9 • 38

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Paper • 2602.08794 • Published Feb 9 • 159

upvoted 3 papers 4 months ago

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Paper • 2601.11077 • Published Jan 16 • 67

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

Paper • 2601.13836 • Published Jan 20 • 37

upvoted 3 papers 5 months ago

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

Paper • 2601.01554 • Published Jan 4 • 60

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Paper • 2512.22234 • Published Dec 23, 2025 • 22

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 66

upvoted 2 papers 6 months ago

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 242

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Paper • 2511.15605 • Published Nov 19, 2025 • 25

upvoted a paper 7 months ago

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 115