LocalVocal: Local Live Captions & Translation On-the-Go
Quote from Mrr Zed0 on March 29, 2025, 1:23 pmLocalVocal plugin allows you to transcribe & translate speech into text locally on your machine in real time.
No GPU required*,
no cloud costs,
no network and
minimal lag! Privacy first – all data stays on your machine. (* GPU acceleration via CUDA or AMD is supported!)
If this plugin has been valuable to you consider adding a
to the GH repo or rating it here on OBS.
Do more with LocalVocal:
https://youtu.be/Q34LQsx-nlg | https://youtu.be/4BTmoKr0YMw | https://youtu.be/E7HKbO6CP_c
Realtime Translation with DeepL | Translate Apps and Videos | 2-minute setupThe plugin adds an Audio Filter – use it on a speech source (mic, video) to get a transcription. Send the captions to a Text Source to show on scene.
Current Features:
- Transcribe audio to text in real time in 100 languages
- Display captions on screen using text sources
- Send captions to a .txt or .srt file (to read by external sources or video playback) with and without aggregation option
- Sync’ed captions with OBS recording timestamps
- Send captions on a RTMP stream to e.g. YouTube, Twitch
- Bring your own Whisper model (any GGML)
- Translate captions in real time to major languages (both Whisper built-in translation as well as NMT models with CTranslate2)
- CUDA, OpenCL, Apple Arm64, AVX & SSE acceleration support
Roadmap:
- More robust built-in translation options
- Additional output options: .vtt, .ssa, .sub, etc.
- Speaker diarization (detecting speakers in a multi-person audio stream)
Internally the plugin is running a neural network (OpenAI Whisper) locally to predict in real time the speech and provide captions.
It’s using the Whisper.cpp project from ggerganov to run the Whisper network in a very efficient way on CPUs and GPUs. For translation it’s using CTranslate2 and the M2M100 model.
If you use this plugin – let us know! We would love to feature your work/vids and showcase your success.
Check out our other plugins:
- Background Removal removes background from webcam without a green screen.
- Detect will detect and track >80 types of objects in real-time inside OBS
- URL/API Source that allows fetching live data from an API and displaying it in OBS.
If you are a broadcasting company or service looking to integrate local AI technology into your pipelines – reach out to inquire about our enterprise services.
Download Link:
https://obsproject.com/forum/resources/localvocal-local-live-captions-translation-on-the-go.1769/
LocalVocal plugin allows you to transcribe & translate speech into text locally on your machine in real time.
No GPU required*,
no cloud costs,
no network and
minimal lag! Privacy first – all data stays on your machine. (* GPU acceleration via CUDA or AMD is supported!)
If this plugin has been valuable to you consider adding a to the GH repo or rating it here on OBS.
Do more with LocalVocal:
Realtime Translation with DeepL | Translate Apps and Videos | 2-minute setup
The plugin adds an Audio Filter – use it on a speech source (mic, video) to get a transcription. Send the captions to a Text Source to show on scene.
Current Features:
- Transcribe audio to text in real time in 100 languages
- Display captions on screen using text sources
- Send captions to a .txt or .srt file (to read by external sources or video playback) with and without aggregation option
- Sync’ed captions with OBS recording timestamps
- Send captions on a RTMP stream to e.g. YouTube, Twitch
- Bring your own Whisper model (any GGML)
- Translate captions in real time to major languages (both Whisper built-in translation as well as NMT models with CTranslate2)
- CUDA, OpenCL, Apple Arm64, AVX & SSE acceleration support
Roadmap:
- More robust built-in translation options
- Additional output options: .vtt, .ssa, .sub, etc.
- Speaker diarization (detecting speakers in a multi-person audio stream)
Internally the plugin is running a neural network (OpenAI Whisper) locally to predict in real time the speech and provide captions.
It’s using the Whisper.cpp project from ggerganov to run the Whisper network in a very efficient way on CPUs and GPUs. For translation it’s using CTranslate2 and the M2M100 model.
If you use this plugin – let us know! We would love to feature your work/vids and showcase your success.
Check out our other plugins:
- Background Removal removes background from webcam without a green screen.
- Detect will detect and track >80 types of objects in real-time inside OBS
- URL/API Source that allows fetching live data from an API and displaying it in OBS.
If you are a broadcasting company or service looking to integrate local AI technology into your pipelines – reach out to inquire about our enterprise services.
Download Link:
https://obsproject.com/forum/resources/localvocal-local-live-captions-translation-on-the-go.1769/