Arabic STT API

Arabic speech-to-text API with diarization and dialects

Stream Arabic audio and get punctuated, diarized transcripts in real time — across MSA and Gulf, Egyptian and Levantine dialects, with native code-switching to English.

  • Real-time WebSocket + batch endpoints
  • Speaker diarization on by default
  • Word-level timestamps and confidence scores
  • Arabic↔English code-switching in one pass
  • Custom vocabulary / brand-name hinting
  • On-premises for PDPL and GDPR workloads

Built for Arabic call centres and media

Generic STT models drop accuracy by 15–25 points the moment a call switches dialect or mixes English. OpenQlik's Arabic STT is trained on hundreds of thousands of hours of telephony, broadcast and conversational audio across MENA, so accuracy holds up where it matters: support calls, sales calls, podcasts, and broadcast captioning.

Endpoints

  • POST /v1/stt — batch transcription for files
  • WS /v1/stt/stream — real-time partials + finals
  • POST /v1/dubbing — combine STT + translation + TTS in one call

Example

curl https://openqlik.bilytica.com/api/v1/stt \
  -H "Authorization: Bearer $OPENQLIK_KEY" \
  -F "audio=@call.wav" \
  -F "language=ar" -F "diarize=true"

Frequently asked questions

How accurate is Arabic transcription compared to English models?

On our internal MSA benchmark we measure 8–12% WER, and 14–20% on dialect audio depending on noise and code-switching. We publish per-dialect numbers in the docs.

Does it handle code-switching between Arabic and English?

Yes. The same model decodes mixed Arabic-English utterances, which is the dominant pattern in Gulf customer-support calls.

Is real-time streaming supported?

Yes — WebSocket streaming with partial results every ~100 ms, plus a batch endpoint for files up to several hours long.

Can transcripts include speaker labels?

Diarization is on by default and returns speaker IDs with each segment, plus word-level start/end timestamps.

Related solutions