Arabic speech-to-text API with diarization and dialects
Stream Arabic audio and get punctuated, diarized transcripts in real time — across MSA and Gulf, Egyptian and Levantine dialects, with native code-switching to English.
- Real-time WebSocket + batch endpoints
- Speaker diarization on by default
- Word-level timestamps and confidence scores
- Arabic↔English code-switching in one pass
- Custom vocabulary / brand-name hinting
- On-premises for PDPL and GDPR workloads
Built for Arabic call centres and media
Generic STT models drop accuracy by 15–25 points the moment a call switches dialect or mixes English. OpenQlik's Arabic STT is trained on hundreds of thousands of hours of telephony, broadcast and conversational audio across MENA, so accuracy holds up where it matters: support calls, sales calls, podcasts, and broadcast captioning.
Endpoints
POST /v1/stt— batch transcription for filesWS /v1/stt/stream— real-time partials + finalsPOST /v1/dubbing— combine STT + translation + TTS in one call
Example
curl https://openqlik.bilytica.com/api/v1/stt \
-H "Authorization: Bearer $OPENQLIK_KEY" \
-F "audio=@call.wav" \
-F "language=ar" -F "diarize=true"Frequently asked questions
How accurate is Arabic transcription compared to English models?
On our internal MSA benchmark we measure 8–12% WER, and 14–20% on dialect audio depending on noise and code-switching. We publish per-dialect numbers in the docs.
Does it handle code-switching between Arabic and English?
Yes. The same model decodes mixed Arabic-English utterances, which is the dominant pattern in Gulf customer-support calls.
Is real-time streaming supported?
Yes — WebSocket streaming with partial results every ~100 ms, plus a batch endpoint for files up to several hours long.
Can transcripts include speaker labels?
Diarization is on by default and returns speaker IDs with each segment, plus word-level start/end timestamps.