Your AI transcript just spelled your CEO's name three different ways. It turned "API" into "ape eye" once, "A.P.I." twice, and "API" only when it felt like it. Every medical term in that ten-minute call came out wrong.

The fix isn't switching engines. It's a small text file — a custom vocabulary — that almost every major speech-to-text API already accepts. You list the words your audio contains. The model biases its output toward those tokens. This post is the template.

Skip to the template if you just want to grab it. Otherwise, read on.

Key takeaways
  • A custom vocabulary is a flat list of proper nouns, acronyms, and jargon you pass with each transcription request.
  • Deepgram, AssemblyAI, AWS Transcribe, Google STT, and Whisper all accept one — different parameter names, same idea.
  • It fixes recognition, not formatting and not audio quality. Keep the list under a few hundred entries, mostly proper nouns.

When a custom vocabulary actually helps

Custom vocabularies move the needle in four cases:

They don't fix bad audio, accented speech the model genuinely can't parse, or speaker labeling. We've written separately about why AI transcripts get names wrong and how accurate AI transcription is for accented English. If your problem lives in those buckets, a vocabulary file won't help much.

The template (copy this)

Save the block below as vocabulary.txt. One term per line. Drop the category headers if your API doesn't allow comments — most ignore lines starting with #, but check.

# === People (replace with your real names) ===
Ananya Krishnan
Tomáš Novák
Siobhán O'Reilly
Mx. Quinn Park

# === Brands and products (yours + ones you mention) ===
Webflow
Kubernetes
PostgreSQL
Tailwind CSS
PagerDuty

# === Acronyms (pronounced as letters) ===
API
SDK
JWT
CTO
QA

# === Acronyms (pronounced as words / mixed) ===
SaaS
WYSIWYG
JSON
GIF

# === Technical jargon ===
idempotent
WebAssembly
RAG pipeline
backpressure
sharding

# === Medical example (replace with your domain) ===
metoprolol
electroencephalogram
COPD
PRN
NPO

# === Legal example (replace with your domain) ===
voir dire
res ipsa loquitur
mens rea
deposition
subpoena duces tecum

Strip the categories you don't need. Add the words your calls actually contain. Keep the list under a few hundred entries unless your engine specifically supports more — bigger isn't better here.

How to format it for each major API

Same words, different wrappers.

If you're running Whisper yourself, the same initial_prompt trick applies. We covered the engine choice in Whisper vs faster-whisper.

What to put in (and what to leave out)

Be picky. The list is a lever, not a dictionary dump.

How to test that it actually worked

Don't trust the parameter set. Verify.

1

Pick a 60-second slice of a real call that contains five or six of your hard words.

2

Transcribe it once with no vocabulary attached. Note every miss.

3

Transcribe it again with the vocabulary attached. Diff the two outputs.

4

If a target term is still wrong, check the engine's docs for casing or boost rules. Google and AWS are case-sensitive in places people don't expect.

5

If the misses don't budge at all, the vocabulary isn't being attached to the request — the parameter name is silently ignored when you misspell it. We see this constantly.

This is the same measurement loop we describe in transcription accuracy: what to expect and what is Word Error Rate. Change one variable, measure again.

What custom vocabularies won't fix

The cases where you need a different fix:

If you'd rather not wire this up yourself, you can transcribe a file with VTS and we'll handle the engine selection and the vocabulary plumbing for you.

Try it now — it's free
Transcribe your video with Ask Giya

Paste any public link or upload a file and get a clean transcript in minutes. First 3 clips every month are on us — no card required.

Start transcribing No subscription · 8¢/min after free clips

A note on display formatting

Custom vocabularies bias recognition. They don't always control display — punctuation, capitalization, and spacing rules belong to a formatter that runs downstream of recognition. If "FedEx" keeps coming back as "Fed Ex," that's a formatting pass, not the vocabulary. AWS Transcribe's table format with a DisplayAs column is the cleanest workaround; on other engines, a post-processing find-and-replace is usually faster than fighting the formatter.

Sources