Hafiz API

A single REST endpoint that turns Qur'anic recitation audio into text with word-level timing and confidence. No model hosting, no GPU, no training. Just POST a wav file.

What you'll get

Speaker-independent accuracy on Qur'anic Arabic — built on facebook/mms-1b, verified to produce identical transcripts across professional reciters (Sudais, Husary) and unseen voices in our internal evals.
Word-level timestamps so you can build live highlight, scrubbing, or alignment-based grading.
Diacritic restoration — pass the target ayah and we return fully vowelised (Mushaf-style) Arabic.
Sub-second latency for short ayahs on our GPU cloud (85–270 ms per ayah on a consumer GPU; cloud will be faster).
Pay only for what you transcribe. Free tier covers 60 minutes/month — enough for hobby projects forever.

Live now. Every endpoint below is implemented and accepts real requests as soon as you create an account. Stripe billing, the customer portal, and webhook delivery are all wired up. Report anything off to hello@hafiz-ai.com.

Quickstart — first transcription in 60 seconds

Send a wav/mp3/m4a file to /v1/transcribe with your API key. Response comes back as JSON.

curl https://api.hafiz-ai.com/v1/transcribe \
  -H "Authorization: Bearer hafiz_sk_live_..." \
  -F "file=@recitation.wav" \
  -F "language=ar" \
  -F "return_timestamps=true"

import requests

resp = requests.post(
    "https://api.hafiz-ai.com/v1/transcribe",
    headers={"Authorization": "Bearer hafiz_sk_live_..."},
    files={"file": open("recitation.wav", "rb")},
    data={"language": "ar", "return_timestamps": True},
)
print(resp.json())

import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("recitation.wav")]));
form.append("language", "ar");
form.append("return_timestamps", "true");

const resp = await fetch("https://api.hafiz-ai.com/v1/transcribe", {
  method: "POST",
  headers: { Authorization: "Bearer hafiz_sk_live_..." },
  body: form,
});
console.log(await resp.json());

var req = URLRequest(url: URL(string: "https://api.hafiz-ai.com/v1/transcribe")!)
req.httpMethod = "POST"
req.setValue("Bearer hafiz_sk_live_...", forHTTPHeaderField: "Authorization")

let boundary = "hafiz-\(UUID().uuidString)"
req.setValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")
req.httpBody = makeMultipart(boundary: boundary, fileURL: recordingURL)

let (data, _) = try await URLSession.shared.data(for: req)
let result = try JSONDecoder().decode(TranscriptionResponse.self, from: data)

You'll get back something like:

{
  "text": "بسم الله الرحمن الرحيم",
  "language": "ar",
  "audio_seconds": 3.84,
  "inference_seconds": 0.71,
  "model": "mms-1b-quran-everyayah-5reciters-v1",
  "words": [
    { "word": "بسم", "start": 0.12, "end": 0.61, "confidence": 0.97 },
    { "word": "الله", "start": 0.62, "end": 1.18, "confidence": 0.99 },
    { "word": "الرحمن", "start": 1.21, "end": 2.02, "confidence": 0.98 },
    { "word": "الرحيم", "start": 2.05, "end": 3.79, "confidence": 0.96 }
  ]
}

Authentication

Every request must include an Authorization header with your secret key:

Authorization: Bearer hafiz_sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keys come in two flavours:

hafiz_sk_test_… — sandbox, free, no charges, deterministic mocked transcripts for CI.
hafiz_sk_live_… — production, billed per second of audio.

Treat keys like passwords. Never commit them to a public repo or ship them in client-side iOS / Android / web bundles. Use a server-side proxy or our short-lived JWT flow (contact us).

Transcribe an audio file

Full transcription of a recording. Best for whole-ayah submissions or post-recitation analysis.

POST https://api.hafiz-ai.com/v1/transcribe

Multipart body

Field	Type	Description
`file` required	file	Audio file. Supported: `wav`, `mp3`, `m4a`, `flac`, `ogg`. Max 25 MB / 5 minutes.
`language` optional	string	Defaults to `ar`. Reserved for future multilingual variants.
`return_timestamps` optional	bool	Defaults to `true`. Include word-level start/end timing.
`target_ayah` optional	string	The target Mushaf ayah text. If supplied, the response includes a `diacritized_text` field with restored ḥarakāt aligned to your target.
`max_new_tokens` optional	integer	Defaults to `225`. Range 32–448.
`webhook_url` optional	string	If set, the API returns `202 Accepted` immediately and POSTs the result to this URL when ready. Recommended for files >30s.

Example

curl -X POST https://api.hafiz-ai.com/v1/transcribe \
  -H "Authorization: Bearer $HAFIZ_KEY" \
  -F "file=@al-fatiha-1.wav" \
  -F "target_ayah=بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"

Live partial transcription

Optimised for low-latency progressive transcription. Send 2–6 second chunks of audio every few seconds and get back what's been recited so far. This is the path Iqra's "Hafiz Live" mode will use.

POST https://api.hafiz-ai.com/v1/transcribe_partial

Same shape as /v1/transcribe but uses a smaller token budget (default 120) and skips post-processing for sub-300ms turnaround on most M-series Macs.

Tip: Send a rolling snapshot of your local recording every 2.5 seconds, and replace the displayed transcript each time. Don't try to stitch chunks — the model handles that.

Service health

GET https://api.hafiz-ai.com/v1/health

Returns { "status": "ok", "ready": true, "backend": "google-cloud-speech-v1", "model": "...", "language": "ar-SA" }. No auth required. Suitable for uptime monitors.

Dashboard & billing endpoints

These are the endpoints the developer dashboard at auth.hafiz-ai.com/dashboard.html uses. They require a Firebase ID token in the Authorization header (issued automatically on sign-in) rather than an API key — they aren't meant for direct programmatic use, but documenting them here in case you want to wire them into a custom dashboard.

Method	Path	Purpose
`POST`	`/v1/bootstrap`	Idempotent: create / update your user record on first sign-in.
`GET`	`/v1/me`	Returns your plan, usage this period, and email.
`POST`	`/v1/keys`	Mint a new API key (form field `livemode=true\|false`). Plaintext returned once.
`GET`	`/v1/keys`	List the prefixes & metadata of all your keys.
`DELETE`	`/v1/keys/{id}`	Revoke a key by its hash id.
`POST`	`/v1/billing/checkout`	Start a Stripe Checkout session. Body: `plan`, `success_url`, `cancel_url`.
`POST`	`/v1/billing/portal`	Open the Stripe customer portal. Body: `return_url`.
`POST`	`/v1/billing/webhook`	Stripe → Hafiz webhook receiver. Verifies `Stripe-Signature`.

Roadmap: A standalone /v1/align endpoint (per-word verdicts: correct, near_match, substituted, missed, extra) is on the Q3 2026 plan. Today this logic ships inside the iOS app — pass target_ayah to /v1/transcribe and use the per-word confidence values for grading server-side.

Response shape

Field	Type	Description
`text`	string	Plain Arabic transcript (no diacritics by default).
`diacritized_text`	string?	Present only if `target_ayah` was supplied. Returns Mushaf-style fully-vowelised Arabic.
`words`	array	Each: `{ word, start, end, confidence }`. Times in seconds.
`language`	string	Detected/forced language. Always `"ar"` in v1.
`audio_seconds`	number	Length of the input audio. This is what gets billed.
`inference_seconds`	number	Wall-clock time we spent on the GPU.
`model`	string	Versioned model identifier — pin against this in production.
`request_id`	string	Surface this in support tickets.

Errors

All errors return JSON of shape { "error": { "code": "...", "message": "..." } }.

HTTP	Code	Meaning
400	`invalid_audio`	File could not be decoded, or is > 25 MB / 5 min.
401	`missing_key`	No `Authorization` header.
401	`invalid_key`	Key is malformed or revoked.
402	`quota_exceeded`	Free tier consumed for the month, or paid card declined.
429	`rate_limited`	Too many concurrent requests. Honor `Retry-After`.
500	`internal`	Our side. Auto-reported. Safe to retry once.
503	`queue_full`	Spike capacity hit. Retry with exponential backoff.

Rate limits

Limits are per-key and reset every 60 seconds. Each response includes:

X-RateLimit-Limit — the cap.
X-RateLimit-Remaining — calls left in the window.
X-RateLimit-Reset — Unix epoch when the window resets.

See the pricing table for per-tier limits. Need more? Email hello@hafiz-ai.com.

Webhooks

For long-running batch transcriptions (set webhook_url in your request), we POST the result back as JSON with the same shape as a sync response, plus an X-Hafiz-Signature HMAC-SHA256 header you should verify with your webhook secret.

# Verifying a webhook in Python
import hmac, hashlib

def verify(body: bytes, sig: str, secret: str) -> bool:
    expected = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, sig)

Pricing

Pay only for the seconds of audio you transcribe. No minimums, no per-request fees, no surprise overage charges — we hard-stop at your monthly cap.

Free

Hobbyist

$0/month

60 minutes / month of audio
20 requests / minute
Word-level timestamps
Diacritic restoration
Sandbox keys included
Community support (GitHub)

Start free

Builder

$29/month

20 hours / month included
Then $0.04 per audio-minute
120 requests / minute
Webhooks for long jobs
Custom reciter list
Email support · 48h SLA

Choose Builder

Scale

Studio

$249/month

250 hours / month included
Then $0.025 per audio-minute
600 requests / minute
Dedicated model instance
Custom fine-tunes (quote)
Priority support · 12h SLA

Contact sales

Need an enterprise / on-prem deployment? We can ship the model as a Docker image to run inside your VPC or air-gapped network. Email hello@hafiz-ai.com.

Get your API key

Sign up at auth.hafiz-ai.com with Google, Apple, or email. You'll land on the dashboard with a test key already minted — paste it into the curl example above to verify everything works end-to-end. Live keys (billable) need a card on file, which you can add from the same dashboard.

The live /v1 surface today:

POST /v1/transcribe — billed, rate-limited, requires an API key.
POST /v1/transcribe_partial — same auth + billing, low-latency partial decoding for live UIs.
GET /v1/health — public, no auth, for uptime monitors.
Dashboard endpoints (/v1/keys, /v1/me, /v1/billing/*) — Firebase ID token, used by auth.hafiz-ai.com.

The surface continues to expand — subscribe to the release notes or email hello@hafiz-ai.com for updates.

If you're evaluating for high volume (> 250 hrs / month) and want a custom contract, email us and we'll set you up directly.

FAQ

Does the API support tajweed scoring?

Not yet — but pass target_ayah to /v1/transcribe and you get back per-word confidence values plus a diacritic-restored transcript. That's what Iqra's tajweed-aware grading is built on today. A standalone /v1/align endpoint and a dedicated tajweed scorer are both on the Q3 2026 roadmap.

Can I use this for non-Qur'anic Arabic?

Technically yes — the underlying model is an MMS / wav2vec 2.0 variant — but accuracy will be much worse than for Qur'anic text. Use OpenAI / Deepgram / Azure for general Arabic. Use us when the audio is recitation.

What's the maximum file size?

25 MB or 5 minutes per request, whichever comes first. For longer audio, slice it client-side or contact us about the batch endpoint.

Do you store my audio?

By default we keep audio for 24h to debug failures, then delete. You can opt out (X-Hafiz-No-Store: 1 header) and we'll process it in-memory only. See the privacy policy.

Is the model open-source?

The base model (facebook/mms-1b) is. Our fine-tuned weights are not — but we publish training recipes and evals on our research page.

Can I run this on-device?

Yes — the underlying model is a wav2vec 2.0 (MMS) architecture, which converts cleanly to CoreML for iOS and ONNX for Android via 🤗 Optimum / Core ML Tools. We'll publish first-class exports alongside the API launch. Until then, the API will give you the same accuracy with zero device cost.

Is the API stable?

The /v1/transcribe endpoint, authentication scheme, and pricing tiers are stable — we won't break them. We may add new fields to responses and new endpoints (batch, streaming, model variants); we won't remove or rename existing ones without a clear migration window.

Qur'anic Arabic speech recognition, by the engineers who understand recitation.

Hafiz API

What you'll get

Quickstart — first transcription in 60 seconds

Authentication

Transcribe an audio file

Multipart body

Example

Live partial transcription

Service health

Dashboard & billing endpoints

Response shape

Errors

Rate limits

Webhooks

Pricing

Hobbyist

Builder

Studio

Get your API key

FAQ

Does the API support tajweed scoring?

Can I use this for non-Qur'anic Arabic?

What's the maximum file size?

Do you store my audio?

Is the model open-source?

Can I run this on-device?

Is the API stable?