The CFO said "modest" three times in the second-quarter call. The first quarter, only once. That's the kind of pattern an analyst catches by reading transcripts side-by-side, not by listening to four hours of replay.

If you cover stocks for a living, or you sit inside an IR team writing the post-call brief, you've already figured out the same thing. The audio is the event. The transcript is what you actually work with.

Here's the workflow that holds up when you're under a deadline and the CEO went off-script.

Why analysts transcribe earnings calls themselves

The official transcript posted on the company's IR page is usually accurate, but it shows up hours after the call ends, sometimes the next morning. Sell-side desks need the read-through faster than that. So do hedge funds running positions through earnings.

Free aggregators like Motley Fool publish lightly-edited transcripts within a few hours, but they smooth out the disfluencies and re-order the Q&A, which kills exactly the signal you want. (Did the CEO hesitate before answering the macro question?)

Running your own transcript:

What makes a good earnings call transcript

Three things matter, in this order:

  1. Speaker labels that separate the prepared remarks from the analyst Q&A
  2. Accurate names: CEOs, CFOs, and the analysts asking questions
  3. Correct financial terms and ticker symbols

If you're listening to the Mastercard call and the transcript says "master card" or "MC" instead of "MA", you'll lose half an hour fixing it before you can grep.

Custom vocabulary lists help here. Most AI transcription tools let you pre-seed a list of expected terms: your covered tickers, executive names, product names. Whisper without a vocabulary list gets "ARPU" wrong as "are pew" more often than not.

The workflow, step by step

1
Capture the audio.

Most public calls stream through services like Notified or Q4, or directly via the company's IR webcast. Recording is allowed (these are public dissemination events under Reg FD), but don't redistribute. Save as <TICKER>-Q<N>-<FY>.mp3.

2
Pre-process if needed.

Webcast audio is usually clean. If you grabbed it via a poor connection, run a noise reducer first. Even a clean Whisper run can't recover dropped packets.

3
Run transcription.

Upload with diarization on and a vocabulary list that includes the company's executives, the covered ticker, peer tickers mentioned in prepared remarks, and any product names. Allow 10 to 15 minutes for a 60-minute call.

4
Spot-check the prepared remarks.

Read the first three paragraphs against the audio. If those are clean, the rest usually is. If not, audio quality is the problem, not the model.

5
Tag the Q&A.

Each analyst introduces themselves ("Smith from Morgan Stanley"). Use that to confirm speaker labels and add the firm name to your transcript.

6
Diff against last quarter.

This is where the real work happens.

Where AI transcripts get earnings call audio wrong

A few recurring failure modes:

Numbers that aren't said as numbers. When the CFO says "one twenty-five basis points of margin", you want "125 bps". Most tools give you the spelled-out version and you'll regex-clean it afterward.

Acronyms. ARPU, RPO, FCF, AUM. Without a vocabulary list, these come out garbled. With one, they're fine.

Acquired companies. If the CEO mentions a tuck-in that closed after the model's training cutoff, you'll get a phonetic guess. Add the acquired name to your vocabulary list before the call.

Analyst names. The "Smith from Morgan Stanley" introductions usually transcribe correctly, but accented analyst names trip up most models. See our piece on why AI transcripts get names wrong for the underlying cause.

Speaker switches in Q&A. Diarization on earnings call audio is harder than on Zoom because the feed is mixed before it reaches you. Expect to fix some speaker boundaries by hand. The patterns are covered in why speaker labels are wrong and how to fix them.

How analysts compare transcripts across quarters

Once you have the transcripts in plain text, the comparison work is mostly diff and grep. A few moves that pay off:

Some quant desks run sentiment models over the full corpus. For most fundamental analysts, a careful side-by-side read is more useful. The model can flag changes, but you still have to decide if they matter.

Are AI transcripts reliable enough for research notes?

For internal notes, trade ideas, and meeting prep: yes, with a spot-check. AI accuracy on clean webcast audio runs in the high 90s when names and acronyms are in your vocabulary list. (For broader context on what to expect, see transcription accuracy: what to expect.)

For published research that goes to clients, you'll want either the company's posted transcript or a paid service that does human review. The cost-per-minute math usually favors AI for routine coverage and human for marquee names where a quote in a research note has to be exactly right.

If you're transcribing a video of an executive interview alongside the call (fireside chats, conference appearances, CNBC hits), the same workflow applies. The same vocabulary list carries over.

Try it now — it's free
Transcribe your video with Ask Giya

Paste any public link or upload a file and get a clean transcript in minutes. First 3 clips every month are on us — no card required.

Start transcribing No subscription · 8¢/min after free clips

Compliance notes worth knowing

Most public companies make their earnings call audio explicitly available for download. Some restrict re-broadcast or republication, so internal research use is generally fine, but pasting a long block into a client-facing report needs the same care as quoting any public source.

Under Reg FD in the US, nothing said on the call is non-public. Your transcript is fair game for internal use the moment the call ends.

Sources