Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
baojian1024 's Collections
Video
Audio
Image
OCR
Comfyui
LTX-2.3
3D models

Audio

updated 5 days ago
Upvote
-

  • microsoft/VibeVoice-ASR

    Automatic Speech Recognition • 9B • Updated Jan 27 • 507k • 1.16k

  • CohereLabs/cohere-transcribe-03-2026

    Automatic Speech Recognition • 2B • Updated 7 days ago • 389k • 973

  • JiongzeYu/SparkVSR

    Updated Apr 4 • 693 • 58

  • smthem/SparkVSR-GGUF

    6B • Updated Mar 25 • 98 • 5

  • microsoft/VibeVoice-1.5B

    Text-to-Speech • 3B • Updated Jan 22 • 51.1k • 2.39k

  • microsoft/VibeVoice-Realtime-0.5B

    Text-to-Speech • 1B • Updated Dec 12, 2025 • 808k • 1.23k

  • meituan-longcat/LongCat-AudioDiT-3.5B

    4B • Updated Apr 3 • 642 • 73

  • openbmb/VoxCPM2

    Text-to-Speech • 2B • Updated Apr 16 • 245k • 1.37k

  • k2-fsa/OmniVoice

    Text-to-Speech • 0.6B • Updated 29 days ago • 2.49M • 980

  • YJX-Xiaomi/ControlFoley

    Text-to-Audio • Updated 22 days ago • 84 • 12

  • Xanthius/Ace-Step-1.5-XL-Concept-Sliders

    Updated 25 days ago • 15

  • Supertone/supertonic-3

    Text-to-Speech • Updated 18 days ago • 61.3k • 799
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs