Audio Course documentation
Supplemental reading and resources
Unit 0. Welcome to the course!
Unit 1. Working with audio data
Unit 2. A gentle introduction to audio applications
Unit 3. Transformer architectures for audio
Unit 4. Build a music genre classifier
Unit 5. Automatic Speech Recognition
What you'll learn and what you'll buildPre-trained models for speech recognitionChoosing a datasetEvaluation and metrics for speech recognitionHow to fine-tune an ASR system with the Trainer APIBuilding a demoHands-on exerciseSupplemental reading and resources
Unit 6. From text to speech
Unit 7. Putting it all together
Unit 8. Finish line
Course Events
Supplemental reading and resources
This unit provided a hands-on introduction to speech recognition, one of the most popular tasks in the audio domain. Want to learn more? Here you will find additional resources that will help you deepen your understanding of the topics and enhance your learning experience.
- Whisper Talk by Jong Wook Kim: a presentation on the Whisper model, explaining the motivation, architecture, training and results, delivered by Whisper author Jong Wook Kim
- End-to-End Speech Benchmark (ESB): a paper that comprehensively argues for using the orthographic WER as opposed to the normalised WER for evaluating ASR systems and presents an accompanying benchmark
- Fine-Tuning Whisper for Multilingual ASR: an in-depth blog post that explains how the Whisper model works in more detail, and the pre- and post-processing steps involved with the feature extractor and tokenizer
- Fine-tuning MMS Adapter Models for Multi-Lingual ASR: an end-to-end guide for fine-tuning Meta AI’s new MMS speech recognition models, freezing the base model weights and only fine-tuning a small number of adapter layers
- Boosting Wav2Vec2 with n-grams in 🤗 Transformers: a blog post for combining CTC models with external language models (LMs) to combat spelling and punctuation errors