Proof · Architecture · Checklist

Run OpenQlik in your own datacenter

We are the only premium voice AI platform built end-to-end on open-source models, so the entire stack — LLM, TTS, STT, real-time voice, and orchestration — can be deployed inside your perimeter. SaaS, private cloud, or fully air-gapped.

Data never leaves your perimeter
100% open-source model stack
SaaS, private cloud, or air-gapped

Reference architecture

Every component runs inside your perimeter. Nothing phones home.

Customer perimeter — VPC / datacenter / air-gap
Edge
Load BalancerWAF + TLS
Control Plane
OpenQlik ConsoleAPI GatewayAgent Orchestrator
Inference (GPU)
LLM (Llama / Mistral)TTS (XTTS)STT (Whisper)Realtime Voice
Data Plane
PostgreSQLpgvector / QdrantMinIO / S3Redis
Observability + Security
SSO / SAML / LDAPAudit + PII redactionPrometheus / GrafanaLoki / ELK

Components shipped

LLM Runtime
Llama 3, Mistral, Qwen, Phi — vLLM / TGI
Realtime Voice
LiveKit + VAD, sub-second barge-in
TTS / STT
XTTS, StyleTTS2, Whisper, Distil-Whisper
Data Layer
PostgreSQL, Redis, pgvector / Qdrant, MinIO
Governance
SSO/SAML/LDAP, RBAC, audit logs, PII redaction
Ops
Prometheus, Grafana, Loki, Helm-managed upgrades

Supported environments

From a laptop pilot to a multi-region GPU cluster.

Pilot / SMB

Docker Compose

Single-node deployment for pilots, demos, and small production workloads.

  • Single-command bring-up via docker compose up
  • Bundled PostgreSQL, Redis, MinIO, vector DB
  • GPU passthrough via NVIDIA Container Toolkit
  • Suggested GPU: 1× 24 GB (e.g. NVIDIA L4, A10 or RTX 4090) — sized to your model choice
Recommended

Kubernetes

Helm charts for HA, autoscaling, and multi-tenant isolation across GPU node pools.

  • Official Helm chart with values overrides per environment
  • Horizontal autoscaling on inference + control plane
  • Works with EKS, AKS, GKE, OpenShift, Rancher, vanilla k8s
  • GPU operator + node selectors for mixed CPU/GPU pools
Air-Gapped / Sovereign

Bare Metal

Offline installer for fully air-gapped, sovereign, or regulated environments.

  • Offline tarball with all images, weights, and dependencies
  • Systemd-based service supervision
  • Datacenter GPUs: NVIDIA H100 / A100 / L40S, AMD MI300 (ROCm) — sizing depends on workload
  • Optional HA via keepalived + Patroni for Postgres
Hardware sizing guide

Pick a tier based on concurrent voice sessions

GPU recommendations depend on the model tier you choose. These are reference configurations our deployment engineers use; final sizing is confirmed during discovery.

Tier

Small (pilot)

Up to ~10 concurrent voice sessions

GPU1× 24 GB GPU
ExamplesNVIDIA L4 · A10 · RTX 4090
Model fitSmall open-source models (Llama 3 8B, Whisper, XTTS)
Tier

Mid (production)

Up to ~100 concurrent voice sessions

GPU2–4× 48–80 GB GPUs
ExamplesNVIDIA L40S · A100 80 GB · AMD MI300
Model fitMid-tier models (Llama 3 70B, Qwen 72B) with autoscaling
Tier

Large (enterprise)

1,000+ concurrent voice sessions

GPU8+× 80 GB GPUs across nodes
ExamplesNVIDIA H100 / H200 · AMD MI300X clusters
Model fitFrontier-class workloads, multi-region HA, 24/7 SLA

CPU, RAM and storage scale with the same tier — see the installation checklist for the full bill of materials.

Enterprise installation checklist

Production-ready in 4 phases

The same checklist our deployment engineers use with banks, telcos, and healthcare customers.

1. Discovery & sizing

  • Confirm expected concurrent voice sessions and TTS/STT minutes per month
  • Inventory available GPU SKUs and pick a tier from the sizing guide below
  • Choose deployment topology: Docker Compose / Kubernetes / bare-metal
  • Identify air-gap, sovereignty, or regulatory requirements (HIPAA, GDPR, PDPL)

2. Infrastructure prerequisites

  • Linux hosts (Ubuntu 22.04+ / RHEL 9+) with kernel 5.15+
  • NVIDIA driver 535+ and Container Toolkit, or AMD ROCm 6+
  • Kubernetes 1.28+ (if k8s) with GPU operator and a CSI storage class
  • PostgreSQL 15+, Redis 7+, S3-compatible object storage (MinIO supported)
  • Internal DNS, TLS certificates, and a load balancer (NGINX/HAProxy/F5)

3. Identity & security

  • Wire SSO via SAML 2.0, OIDC, or LDAP/AD
  • Define RBAC roles, workspace boundaries, and per-tenant quotas
  • Enable audit log shipping to your SIEM (Splunk, ELK, Sentinel)
  • Configure PII redaction policies and data retention windows
  • Generate offline license + signing keys for air-gapped environments

4. Install & validate

  • Pull or sideload OpenQlik images and open-source model weights
  • Run helm install openqlik or the offline installer bundle
  • Smoke test: TTS, STT, agent orchestration, real-time voice round-trip
  • Load test target concurrency with the bundled k6 scenarios
  • Hand off runbooks for upgrades, backups, and incident response

Ready for an on-prem deployment?

Our solutions team will run sizing, draft a topology, and ship a pilot in under 2 weeks.