Proof · Architecture · Checklist

Run OpenQlik in your own datacenter

We are the only premium voice AI platform built end-to-end on open-source models, so the entire stack — LLM, TTS, STT, real-time voice, and orchestration — can be deployed inside your perimeter. SaaS, private cloud, or fully air-gapped.

Request deployment View architecture

Data never leaves your perimeter

100% open-source model stack

SaaS, private cloud, or air-gapped

Reference architecture

Every component runs inside your perimeter. Nothing phones home.

Customer perimeter — VPC / datacenter / air-gap

Edge

Load BalancerWAF + TLS

Control Plane

OpenQlik ConsoleAPI GatewayAgent Orchestrator

Inference (GPU)

LLM (Llama / Mistral)TTS (XTTS)STT (Whisper)Realtime Voice

Data Plane

PostgreSQLpgvector / QdrantMinIO / S3Redis

Observability + Security

SSO / SAML / LDAPAudit + PII redactionPrometheus / GrafanaLoki / ELK

Components shipped

LLM Runtime

Llama 3, Mistral, Qwen, Phi — vLLM / TGI

Realtime Voice

LiveKit + VAD, sub-second barge-in

TTS / STT

XTTS, StyleTTS2, Whisper, Distil-Whisper

Data Layer

PostgreSQL, Redis, pgvector / Qdrant, MinIO

Governance

SSO/SAML/LDAP, RBAC, audit logs, PII redaction

Ops

Prometheus, Grafana, Loki, Helm-managed upgrades

Supported environments

From a laptop pilot to a multi-region GPU cluster.

Pilot / SMB

Docker Compose

Single-node deployment for pilots, demos, and small production workloads.

Single-command bring-up via docker compose up
Bundled PostgreSQL, Redis, MinIO, vector DB
GPU passthrough via NVIDIA Container Toolkit
Suggested GPU: 1× 24 GB (e.g. NVIDIA L4, A10 or RTX 4090) — sized to your model choice

Recommended

Kubernetes

Helm charts for HA, autoscaling, and multi-tenant isolation across GPU node pools.

Official Helm chart with values overrides per environment
Horizontal autoscaling on inference + control plane
Works with EKS, AKS, GKE, OpenShift, Rancher, vanilla k8s
GPU operator + node selectors for mixed CPU/GPU pools

Air-Gapped / Sovereign

Bare Metal

Offline installer for fully air-gapped, sovereign, or regulated environments.

Offline tarball with all images, weights, and dependencies
Systemd-based service supervision
Datacenter GPUs: NVIDIA H100 / A100 / L40S, AMD MI300 (ROCm) — sizing depends on workload
Optional HA via keepalived + Patroni for Postgres

Hardware sizing guide

Pick a tier based on concurrent voice sessions

GPU recommendations depend on the model tier you choose. These are reference configurations our deployment engineers use; final sizing is confirmed during discovery.

Tier

Small (pilot)

Up to ~10 concurrent voice sessions

GPU1× 24 GB GPU

ExamplesNVIDIA L4 · A10 · RTX 4090

Model fitSmall open-source models (Llama 3 8B, Whisper, XTTS)

Tier

Mid (production)

Up to ~100 concurrent voice sessions

GPU2–4× 48–80 GB GPUs

ExamplesNVIDIA L40S · A100 80 GB · AMD MI300

Model fitMid-tier models (Llama 3 70B, Qwen 72B) with autoscaling

Tier

Large (enterprise)

1,000+ concurrent voice sessions

GPU8+× 80 GB GPUs across nodes

ExamplesNVIDIA H100 / H200 · AMD MI300X clusters

Model fitFrontier-class workloads, multi-region HA, 24/7 SLA

CPU, RAM and storage scale with the same tier — see the installation checklist for the full bill of materials.

Enterprise installation checklist

Production-ready in 4 phases

The same checklist our deployment engineers use with banks, telcos, and healthcare customers.

1. Discovery & sizing

Confirm expected concurrent voice sessions and TTS/STT minutes per month
Inventory available GPU SKUs and pick a tier from the sizing guide below
Choose deployment topology: Docker Compose / Kubernetes / bare-metal
Identify air-gap, sovereignty, or regulatory requirements (HIPAA, GDPR, PDPL)

2. Infrastructure prerequisites

Linux hosts (Ubuntu 22.04+ / RHEL 9+) with kernel 5.15+
NVIDIA driver 535+ and Container Toolkit, or AMD ROCm 6+
Kubernetes 1.28+ (if k8s) with GPU operator and a CSI storage class
PostgreSQL 15+, Redis 7+, S3-compatible object storage (MinIO supported)
Internal DNS, TLS certificates, and a load balancer (NGINX/HAProxy/F5)

3. Identity & security

Wire SSO via SAML 2.0, OIDC, or LDAP/AD
Define RBAC roles, workspace boundaries, and per-tenant quotas
Enable audit log shipping to your SIEM (Splunk, ELK, Sentinel)
Configure PII redaction policies and data retention windows
Generate offline license + signing keys for air-gapped environments

4. Install & validate

Pull or sideload OpenQlik images and open-source model weights
Run helm install openqlik or the offline installer bundle
Smoke test: TTS, STT, agent orchestration, real-time voice round-trip
Load test target concurrency with the bundled k6 scenarios
Hand off runbooks for upgrades, backups, and incident response

Ready for an on-prem deployment?

Our solutions team will run sizing, draft a topology, and ship a pilot in under 2 weeks.

Talk to on-prem team Try SaaS first