
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
19.9K stars
2.1K forks
19.9K watching
Updated 2/27/2026
Loading star history...
Health Score
75
Weekly Growth
+0
+0.0% this week
Contributors
1
Total contributors
Open Issues
280
Generated Insights
About whisperX
WhisperX
Recall.ai - Meeting Transcription API
If you’re looking for a transcription API for meetings, consider checking out Recall.ai's Meeting Transcription API, an API that works with Zoom, Google Meet, Microsoft Teams, and more. Recall.ai diarizes by pulling the speaker data and separate audio streams from the meeting platforms, which means 100% accurate speaker diarization with actual speaker names.
This repository provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.
- ⚡️ Batched inference for 70x realtime transcription using whisper large-v2
- 🪶 faster-whisper backend, requires <8GB gpu memory for large-v2 with beam_size=5
- 🎯 Accurate word-level timestamps using wav2vec2 alignment
- 👯♂️ Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)
