Deepgram is a speech-to-text API for developers. Not an app. Not a Zoom integration. If you came here looking to transcribe one meeting and move on, this isn't the right tool — pick something with a web UI and skip the SDK.

If you're building a product that needs speech-to-text at scale, Deepgram is one of the strongest choices on the market. The pricing is genuinely aggressive, the streaming latency is real, and the models hold up. It's also opinionated, API-first, and asks you to make calls that other vendors hide from you.

We spent a week pushing audio through it. Here's the honest read.

Key takeaways
  • Deepgram is API-only — there's no web UI for transcribing files like Otter or Rev
  • Nova-3 is the current flagship STT model; accuracy is competitive with Whisper-Large and AssemblyAI Universal
  • Pre-paid pay-as-you-go starts around $0.0043/min for streaming — roughly $0.26/hour
  • $200 in free credits when you sign up — enough to transcribe several hundred hours on the standard models
  • Diarization is included and decent, but not as polished as AssemblyAI's on hard multi-speaker audio
  • Best fit: dev teams building real-time voice products, call analytics, voice agents, or anything regulated that needs on-prem

What is Deepgram, exactly?

Deepgram is a speech recognition platform. You send audio, you get back a transcript — over REST, WebSocket, or one of their SDKs (Python, Node, .NET, Go, Rust). There's a dashboard for keys and usage, but the product is the API.

That sets the floor. To use Deepgram you need a developer in the loop, or a third-party tool that wraps it. If you want to drag an mp3 into a browser and download an SRT, look elsewhere — this isn't that.

What you get in return: speed, control, and prices that get cheap fast at volume.

How accurate is Deepgram?

Deepgram's current flagship is Nova-3. On clean conversational English they publish word error rates in the high single digits, competitive with Whisper-Large and AssemblyAI Universal, and they post their benchmark methodology so you can argue with it.

In practice, expect roughly:

Nova-3 added stronger handling of multilingual code-switching and better performance on long-tail named entities (drug names, product SKUs, that kind of thing). It's not magic, but it's a real upgrade over Nova-2 on the messy stuff.

One thing to flag: vendor-published WER is almost always on the vendor's own internal test set. Run it on your audio before you believe the number.

How much does Deepgram cost in 2026?

Deepgram is priced per minute of audio processed. There's a generous free credit, then pay-as-you-go, then committed-volume pricing for larger customers.

Plan Model Approx. rate (USD/min) Notes
Pay-as-you-go Nova-3 ~$0.0043 streaming / batch Per Deepgram pricing
Pay-as-you-go Nova-2 / Base Cheaper still Older models, often plenty
Growth (committed) Nova-3 ~10–30% discount Annual commit
Enterprise Custom Negotiated SLAs, on-prem options

For context, $0.0043/min is about $0.26/hour. AssemblyAI's pay-as-you-go Universal sits around $0.37/hour. Rev's human transcription runs about $1.50/min — orders of magnitude higher because human review is a different product, not the same one done worse.

The $200 signup credit is enough for several hundred hours of Nova-3 on batch — a real evaluation budget, not a vanity giveaway.

Watch for: add-on features (diarization, summarization, redaction) generally bundle into Nova-3, but some legacy models charge extras. Check the current pricing page before you put numbers in a finance spreadsheet.

What can Deepgram do that other APIs can't?

A few things stand out.

Real-time streaming that actually feels real-time. Sub-300ms end-to-end latency on a decent connection. If you're building a voice agent or a live captioning tool, that's the bar.

Self-hosting. Most STT APIs are cloud-only. Deepgram offers on-premises and VPC deployments for customers who need data residency or hardware-bound inference. That's rare in this market and a quiet differentiator for regulated industries.

Custom model training. You can fine-tune on your domain audio. If your product handles medical dictation, sports commentary, or anything with vocabulary the public models butcher, that lever is genuinely useful.

Multilingual without switching models. Nova-3 covers 36+ languages and code-switches mid-sentence on the better-supported pairs. If your audio mixes English and Spanish in one utterance, that matters.

Where does Deepgram fall short?

It's not the right tool for everything.

No file-upload UI. Said this already, repeating because users keep landing here looking for a Rev replacement and bouncing. There is no drag-and-drop. Build your own or pick a different product.

Diarization is good, not great. Speaker labels are accurate enough for analytics but often need cleanup for publication. AssemblyAI's diarization is meaningfully better on hard multi-speaker audio — see why speaker labels are wrong for the failure modes that bite both vendors.

Docs have rough edges. The core API is fine. The fringes — custom vocabularies, advanced redaction config, the gRPC endpoints — get into "read the source" territory faster than you'd expect.

Lock-in risk. Deepgram's model names and feature endpoints don't map cleanly to other vendors. Switching costs are real, especially if you've built on Nova-3-specific features like keyterm prompting.

Who is Deepgram for?

A short list:

If you're a researcher transcribing 20 interviews, you don't need this. If you're a journalist with one Zoom recording, you really don't. Pick a tool with a UI.

Deepgram vs the alternatives

Honest snapshot against the obvious comparisons. The full breakdown lives in our AssemblyAI alternatives roundup and the 2026 services comparison.

Should you use Deepgram?

If you're shipping a product that needs speech-to-text in the loop — yes, evaluate it seriously. Spend the free credits on your actual audio, not their demo files. Test diarization on your worst recording, not your best. Measure latency on the network conditions your real users have, not your office Wi-Fi.

If you're a one-off transcriber — no. Just transcribe the file in a browser and skip the SDK entirely.

Try it now — it's free
Transcribe your video with VTS

Paste any public link or upload a file and get a clean transcript in minutes. First 3 clips every month are on us — no card required.

Start transcribing No subscription · 8¢/min after free clips

FAQ

Does Deepgram have a free tier?

There's no permanent free tier, but new accounts get $200 in credits. That's enough to transcribe several hundred hours on the standard models — a real budget for evaluation, not a 30-second trial.

Can I use Deepgram without coding?

Not really. It's an API. You'd need a developer or a third-party tool that wraps Deepgram. Some workflow products do, but the native experience is code.

Is Deepgram HIPAA-compliant?

Deepgram offers HIPAA-eligible deployments for paid customers under a BAA. Pay-as-you-go is not HIPAA-eligible by default — talk to their sales team if you need it.

Does Deepgram support live transcription?

Yes. It's one of the product's core strengths. WebSocket streaming with sub-300ms latency on a stable connection.

Can I run Deepgram on-premises?

Yes. Deepgram offers self-hosted deployments for enterprise customers. That's unusual in the STT market and a real differentiator for regulated industries.

How does Deepgram pricing compare to OpenAI Whisper API?

Whisper API is $0.006/min flat. Deepgram Nova-3 streaming is around $0.0043/min on pay-as-you-go. At volume the gap widens because Deepgram offers committed-use discounts and Whisper API does not.

Sources