Deepgram is a speech-to-text API for developers. Not an app. Not a Zoom integration. If you came here looking to transcribe one meeting and move on, this isn't the right tool — pick something with a web UI and skip the SDK.
If you're building a product that needs speech-to-text at scale, Deepgram is one of the strongest choices on the market. The pricing is genuinely aggressive, the streaming latency is real, and the models hold up. It's also opinionated, API-first, and asks you to make calls that other vendors hide from you.
We spent a week pushing audio through it. Here's the honest read.
- Deepgram is API-only — there's no web UI for transcribing files like Otter or Rev
- Nova-3 is the current flagship STT model; accuracy is competitive with Whisper-Large and AssemblyAI Universal
- Pre-paid pay-as-you-go starts around $0.0043/min for streaming — roughly $0.26/hour
- $200 in free credits when you sign up — enough to transcribe several hundred hours on the standard models
- Diarization is included and decent, but not as polished as AssemblyAI's on hard multi-speaker audio
- Best fit: dev teams building real-time voice products, call analytics, voice agents, or anything regulated that needs on-prem
What is Deepgram, exactly?
Deepgram is a speech recognition platform. You send audio, you get back a transcript — over REST, WebSocket, or one of their SDKs (Python, Node, .NET, Go, Rust). There's a dashboard for keys and usage, but the product is the API.
That sets the floor. To use Deepgram you need a developer in the loop, or a third-party tool that wraps it. If you want to drag an mp3 into a browser and download an SRT, look elsewhere — this isn't that.
What you get in return: speed, control, and prices that get cheap fast at volume.
How accurate is Deepgram?
Deepgram's current flagship is Nova-3. On clean conversational English they publish word error rates in the high single digits, competitive with Whisper-Large and AssemblyAI Universal, and they post their benchmark methodology so you can argue with it.
In practice, expect roughly:
- Clean dictation or studio audio: 4–7% WER
- Two-speaker phone calls: 7–12% WER
- Multi-speaker meetings with crosstalk: 12–20% WER
- Heavily accented English: degrades like every model — sometimes a lot
Nova-3 added stronger handling of multilingual code-switching and better performance on long-tail named entities (drug names, product SKUs, that kind of thing). It's not magic, but it's a real upgrade over Nova-2 on the messy stuff.
One thing to flag: vendor-published WER is almost always on the vendor's own internal test set. Run it on your audio before you believe the number.
How much does Deepgram cost in 2026?
Deepgram is priced per minute of audio processed. There's a generous free credit, then pay-as-you-go, then committed-volume pricing for larger customers.
| Plan | Model | Approx. rate (USD/min) | Notes |
|---|---|---|---|
| Pay-as-you-go | Nova-3 | ~$0.0043 streaming / batch | Per Deepgram pricing |
| Pay-as-you-go | Nova-2 / Base | Cheaper still | Older models, often plenty |
| Growth (committed) | Nova-3 | ~10–30% discount | Annual commit |
| Enterprise | Custom | Negotiated | SLAs, on-prem options |
For context, $0.0043/min is about $0.26/hour. AssemblyAI's pay-as-you-go Universal sits around $0.37/hour. Rev's human transcription runs about $1.50/min — orders of magnitude higher because human review is a different product, not the same one done worse.
The $200 signup credit is enough for several hundred hours of Nova-3 on batch — a real evaluation budget, not a vanity giveaway.
Watch for: add-on features (diarization, summarization, redaction) generally bundle into Nova-3, but some legacy models charge extras. Check the current pricing page before you put numbers in a finance spreadsheet.
What can Deepgram do that other APIs can't?
A few things stand out.
Real-time streaming that actually feels real-time. Sub-300ms end-to-end latency on a decent connection. If you're building a voice agent or a live captioning tool, that's the bar.
Self-hosting. Most STT APIs are cloud-only. Deepgram offers on-premises and VPC deployments for customers who need data residency or hardware-bound inference. That's rare in this market and a quiet differentiator for regulated industries.
Custom model training. You can fine-tune on your domain audio. If your product handles medical dictation, sports commentary, or anything with vocabulary the public models butcher, that lever is genuinely useful.
Multilingual without switching models. Nova-3 covers 36+ languages and code-switches mid-sentence on the better-supported pairs. If your audio mixes English and Spanish in one utterance, that matters.
Where does Deepgram fall short?
It's not the right tool for everything.
No file-upload UI. Said this already, repeating because users keep landing here looking for a Rev replacement and bouncing. There is no drag-and-drop. Build your own or pick a different product.
Diarization is good, not great. Speaker labels are accurate enough for analytics but often need cleanup for publication. AssemblyAI's diarization is meaningfully better on hard multi-speaker audio — see why speaker labels are wrong for the failure modes that bite both vendors.
Docs have rough edges. The core API is fine. The fringes — custom vocabularies, advanced redaction config, the gRPC endpoints — get into "read the source" territory faster than you'd expect.
Lock-in risk. Deepgram's model names and feature endpoints don't map cleanly to other vendors. Switching costs are real, especially if you've built on Nova-3-specific features like keyterm prompting.
Who is Deepgram for?
A short list:
- Voice agent builders. Sub-300ms latency, websocket streaming, predictable pricing.
- Call center and sales analytics platforms. High volume, diarization, sentiment, summarization in one API.
- Live captioning products. Streaming latency holds up under load.
- Teams that need on-prem speech-to-text. Almost nobody else offers it seriously.
If you're a researcher transcribing 20 interviews, you don't need this. If you're a journalist with one Zoom recording, you really don't. Pick a tool with a UI.
Deepgram vs the alternatives
Honest snapshot against the obvious comparisons. The full breakdown lives in our AssemblyAI alternatives roundup and the 2026 services comparison.
- vs AssemblyAI: Deepgram is cheaper and faster; AssemblyAI has stronger diarization and more polished audio intelligence (entity detection, sentiment, content safety).
- vs OpenAI Whisper API: Deepgram wins on streaming and per-minute price at volume; Whisper-via-API wins on robust accuracy across messy long-form audio with no tuning.
- vs self-hosted Whisper: If you have engineers and GPUs, faster-whisper is essentially free at marginal cost. If you don't, Deepgram is the lower-headache choice and probably cheaper than you think.
Should you use Deepgram?
If you're shipping a product that needs speech-to-text in the loop — yes, evaluate it seriously. Spend the free credits on your actual audio, not their demo files. Test diarization on your worst recording, not your best. Measure latency on the network conditions your real users have, not your office Wi-Fi.
If you're a one-off transcriber — no. Just transcribe the file in a browser and skip the SDK entirely.
Paste any public link or upload a file and get a clean transcript in minutes. First 3 clips every month are on us — no card required.
FAQ
Does Deepgram have a free tier?
There's no permanent free tier, but new accounts get $200 in credits. That's enough to transcribe several hundred hours on the standard models — a real budget for evaluation, not a 30-second trial.
Can I use Deepgram without coding?
Not really. It's an API. You'd need a developer or a third-party tool that wraps Deepgram. Some workflow products do, but the native experience is code.
Is Deepgram HIPAA-compliant?
Deepgram offers HIPAA-eligible deployments for paid customers under a BAA. Pay-as-you-go is not HIPAA-eligible by default — talk to their sales team if you need it.
Does Deepgram support live transcription?
Yes. It's one of the product's core strengths. WebSocket streaming with sub-300ms latency on a stable connection.
Can I run Deepgram on-premises?
Yes. Deepgram offers self-hosted deployments for enterprise customers. That's unusual in the STT market and a real differentiator for regulated industries.
How does Deepgram pricing compare to OpenAI Whisper API?
Whisper API is $0.006/min flat. Deepgram Nova-3 streaming is around $0.0043/min on pay-as-you-go. At volume the gap widens because Deepgram offers committed-use discounts and Whisper API does not.



