Question 1

How accurate is Arabic transcription compared to English models?

Accepted Answer

On our internal MSA benchmark we measure 8–12% WER, and 14–20% on dialect audio depending on noise and code-switching. We publish per-dialect numbers in the docs.

Question 2

Does it handle code-switching between Arabic and English?

Accepted Answer

Yes. The same model decodes mixed Arabic-English utterances, which is the dominant pattern in Gulf customer-support calls.

Question 3

Is real-time streaming supported?

Accepted Answer

Yes — WebSocket streaming with partial results every ~100 ms, plus a batch endpoint for files up to several hours long.

Question 4

Can transcripts include speaker labels?

Accepted Answer

Diarization is on by default and returns speaker IDs with each segment, plus word-level start/end timestamps.

Arabic speech-to-text API with diarization and dialects

Built for Arabic call centres and media

Endpoints

Example

Frequently asked questions

How accurate is Arabic transcription compared to English models?

Does it handle code-switching between Arabic and English?

Is real-time streaming supported?

Can transcripts include speaker labels?

Related solutions