Hafiz API
A single REST endpoint that turns Qur'anic recitation audio into text with word-level timing and confidence. No model hosting, no GPU, no training. Just POST a wav file.
What you'll get
- Speaker-independent accuracy on Qur'anic Arabic — built on
tarteel-ai/whisper-base-ar-quran, verified to produce identical transcripts across professional reciters (Sudais, Husary) and unseen voices in our internal evals. - Word-level timestamps so you can build live highlight, scrubbing, or alignment-based grading.
- Diacritic restoration — pass the target ayah and we return fully vowelised (Mushaf-style) Arabic.
- Sub-second latency for short ayahs on our GPU cloud (85–270 ms per ayah on a consumer GPU; cloud will be faster).
- Pay only for what you transcribe. Free tier covers 60 minutes/month — enough for hobby projects forever.
Quickstart — first transcription in 60 seconds
Send a wav/mp3/m4a file to /v1/transcribe with your API key. Response comes back as JSON.
curl https://api.hafiz-ai.com/v1/transcribe \ -H "Authorization: Bearer hafiz_sk_live_..." \ -F "file=@recitation.wav" \ -F "language=ar" \ -F "return_timestamps=true"
import requests resp = requests.post( "https://api.hafiz-ai.com/v1/transcribe", headers={"Authorization": "Bearer hafiz_sk_live_..."}, files={"file": open("recitation.wav", "rb")}, data={"language": "ar", "return_timestamps": True}, ) print(resp.json())
import fs from "node:fs"; const form = new FormData(); form.append("file", new Blob([fs.readFileSync("recitation.wav")])); form.append("language", "ar"); form.append("return_timestamps", "true"); const resp = await fetch("https://api.hafiz-ai.com/v1/transcribe", { method: "POST", headers: { Authorization: "Bearer hafiz_sk_live_..." }, body: form, }); console.log(await resp.json());
var req = URLRequest(url: URL(string: "https://api.hafiz-ai.com/v1/transcribe")!) req.httpMethod = "POST" req.setValue("Bearer hafiz_sk_live_...", forHTTPHeaderField: "Authorization") let boundary = "hafiz-\(UUID().uuidString)" req.setValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type") req.httpBody = makeMultipart(boundary: boundary, fileURL: recordingURL) let (data, _) = try await URLSession.shared.data(for: req) let result = try JSONDecoder().decode(TranscriptionResponse.self, from: data)
You'll get back something like:
{ "text": "بسم الله الرحمن الرحيم", "language": "ar", "audio_seconds": 3.84, "inference_seconds": 0.71, "model": "whisper-quran-everyayah-5reciters-v1", "words": [ { "word": "بسم", "start": 0.12, "end": 0.61, "confidence": 0.97 }, { "word": "الله", "start": 0.62, "end": 1.18, "confidence": 0.99 }, { "word": "الرحمن", "start": 1.21, "end": 2.02, "confidence": 0.98 }, { "word": "الرحيم", "start": 2.05, "end": 3.79, "confidence": 0.96 } ] }
Authentication
Every request must include an Authorization header with your secret key:
Authorization: Bearer hafiz_sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Keys come in two flavours:
hafiz_sk_test_…— sandbox, free, no charges, deterministic mocked transcripts for CI.hafiz_sk_live_…— production, billed per second of audio.
Transcribe an audio file
Full transcription of a recording. Best for whole-ayah submissions or post-recitation analysis.
Multipart body
| Field | Type | Description |
|---|---|---|
file required | file | Audio file. Supported: wav, mp3, m4a, flac, ogg. Max 25 MB / 5 minutes. |
language optional | string | Defaults to ar. Reserved for future multilingual variants. |
return_timestamps optional | bool | Defaults to true. Include word-level start/end timing. |
target_ayah optional | string | The target Mushaf ayah text. If supplied, the response includes a diacritized_text field with restored ḥarakāt aligned to your target. |
max_new_tokens optional | integer | Defaults to 225. Range 32–448. |
webhook_url optional | string | If set, the API returns 202 Accepted immediately and POSTs the result to this URL when ready. Recommended for files >30s. |
Example
curl -X POST https://api.hafiz-ai.com/v1/transcribe \ -H "Authorization: Bearer $HAFIZ_KEY" \ -F "file=@al-fatiha-1.wav" \ -F "target_ayah=بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"
Live partial transcription
Optimised for low-latency progressive transcription. Send 2–6 second chunks of audio every few seconds and get back what's been recited so far. This is the path Iqra's "Hafiz Live" mode will use.
Same shape as /v1/transcribe but uses a smaller token budget (default 120) and skips post-processing for sub-300ms turnaround on most M-series Macs.
Service health
Returns { "status": "ok", "ready": true, "backend": "google-cloud-speech-v1", "model": "...", "language": "ar-SA" }. No auth required. Suitable for uptime monitors.
Dashboard & billing endpoints
These are the endpoints the developer dashboard at auth.hafiz-ai.com/dashboard.html uses. They require a Firebase ID token in the Authorization header (issued automatically on sign-in) rather than an API key — they aren't meant for direct programmatic use, but documenting them here in case you want to wire them into a custom dashboard.
| Method | Path | Purpose |
|---|---|---|
POST | /v1/bootstrap | Idempotent: create / update your user record on first sign-in. |
GET | /v1/me | Returns your plan, usage this period, and email. |
POST | /v1/keys | Mint a new API key (form field livemode=true|false). Plaintext returned once. |
GET | /v1/keys | List the prefixes & metadata of all your keys. |
DELETE | /v1/keys/{id} | Revoke a key by its hash id. |
POST | /v1/billing/checkout | Start a Stripe Checkout session. Body: plan, success_url, cancel_url. |
POST | /v1/billing/portal | Open the Stripe customer portal. Body: return_url. |
POST | /v1/billing/webhook | Stripe → Hafiz webhook receiver. Verifies Stripe-Signature. |
/v1/align endpoint (per-word verdicts: correct, near_match, substituted, missed, extra) is on the Q3 2026 plan. Today this logic ships inside the iOS app — pass target_ayah to /v1/transcribe and use the per-word confidence values for grading server-side.
Response shape
| Field | Type | Description |
|---|---|---|
text | string | Plain Arabic transcript (no diacritics by default). |
diacritized_text | string? | Present only if target_ayah was supplied. Returns Mushaf-style fully-vowelised Arabic. |
words | array | Each: { word, start, end, confidence }. Times in seconds. |
language | string | Detected/forced language. Always "ar" in v1. |
audio_seconds | number | Length of the input audio. This is what gets billed. |
inference_seconds | number | Wall-clock time we spent on the GPU. |
model | string | Versioned model identifier — pin against this in production. |
request_id | string | Surface this in support tickets. |
Errors
All errors return JSON of shape { "error": { "code": "...", "message": "..." } }.
| HTTP | Code | Meaning |
|---|---|---|
| 400 | invalid_audio | File could not be decoded, or is > 25 MB / 5 min. |
| 401 | missing_key | No Authorization header. |
| 401 | invalid_key | Key is malformed or revoked. |
| 402 | quota_exceeded | Free tier consumed for the month, or paid card declined. |
| 429 | rate_limited | Too many concurrent requests. Honor Retry-After. |
| 500 | internal | Our side. Auto-reported. Safe to retry once. |
| 503 | queue_full | Spike capacity hit. Retry with exponential backoff. |
Rate limits
Limits are per-key and reset every 60 seconds. Each response includes:
X-RateLimit-Limit— the cap.X-RateLimit-Remaining— calls left in the window.X-RateLimit-Reset— Unix epoch when the window resets.
See the pricing table for per-tier limits. Need more? Email hello@hafiz-ai.com.
Webhooks
For long-running batch transcriptions (set webhook_url in your request), we POST the result back as JSON with the same shape as a sync response, plus an X-Hafiz-Signature HMAC-SHA256 header you should verify with your webhook secret.
# Verifying a webhook in Python import hmac, hashlib def verify(body: bytes, sig: str, secret: str) -> bool: expected = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest() return hmac.compare_digest(expected, sig)
Pricing
Pay only for the seconds of audio you transcribe. No minimums, no per-request fees, no surprise overage charges — we hard-stop at your monthly cap.
Hobbyist
- 60 minutes / month of audio
- 20 requests / minute
- Word-level timestamps
- Diacritic restoration
- Sandbox keys included
- Community support (GitHub)
Builder
- 20 hours / month included
- Then $0.04 per audio-minute
- 120 requests / minute
- Webhooks for long jobs
- Custom reciter list
- Email support · 48h SLA
Studio
- 250 hours / month included
- Then $0.025 per audio-minute
- 600 requests / minute
- Dedicated model instance
- Custom fine-tunes (quote)
- Priority support · 12h SLA
Get your API key
Sign up at auth.hafiz-ai.com with Google, Apple, or email. You'll land on the dashboard with a test key already minted — paste it into the curl example above to verify everything works end-to-end. Live keys (billable) need a card on file, which you can add from the same dashboard.
The live /v1 surface today:
POST /v1/transcribe— billed, rate-limited, requires an API key.POST /v1/transcribe_partial— same auth + billing, low-latency partial decoding for live UIs.GET /v1/health— public, no auth, for uptime monitors.- Dashboard endpoints (
/v1/keys,/v1/me,/v1/billing/*) — Firebase ID token, used by auth.hafiz-ai.com.
The surface continues to expand — subscribe to the release notes or email hello@hafiz-ai.com for updates.
If you're evaluating for high volume (> 250 hrs / month) and want a custom contract, email us and we'll set you up directly.
FAQ
Does the API support tajweed scoring?
Not yet — but pass target_ayah to /v1/transcribe and you get back per-word confidence values plus a diacritic-restored transcript. That's what Iqra's tajweed-aware grading is built on today. A standalone /v1/align endpoint and a dedicated tajweed scorer are both on the Q3 2026 roadmap.
Can I use this for non-Qur'anic Arabic?
Technically yes — the underlying model is a Whisper variant — but accuracy will be much worse than for Qur'anic text. Use OpenAI / Deepgram / Azure for general Arabic. Use us when the audio is recitation.
What's the maximum file size?
25 MB or 5 minutes per request, whichever comes first. For longer audio, slice it client-side or contact us about the batch endpoint.
Do you store my audio?
By default we keep audio for 24h to debug failures, then delete. You can opt out (X-Hafiz-No-Store: 1 header) and we'll process it in-memory only. See the privacy policy.
Is the model open-source?
The base model (tarteel-ai/whisper-base-ar-quran) is. Our fine-tuned weights are not — but we publish training recipes and evals on our research page.
Can I run this on-device?
Yes — the underlying model is standard Whisper, which converts cleanly to CoreML for iOS (via WhisperKit) and ONNX for Android. We'll publish first-class exports alongside the API launch. Until then, the API will give you the same accuracy with zero device cost.
Is the API stable?
The /v1/transcribe endpoint, authentication scheme, and pricing tiers are stable — we won't break them. We may add new fields to responses and new endpoints (batch, streaming, model variants); we won't remove or rename existing ones without a clear migration window.