How to Transcribe a Deposition Recording Accurately

A deposition transcript is evidence. That changes everything about how you transcribe it. A 2% word error rate is fine for a podcast and a serious problem for a sworn statement, where one flipped "can" versus "can't" can move a case.

Here's the workflow I use when a paralegal or solo attorney sends me a Zoom-recorded deposition and asks for a clean, timestamped, speaker-labeled transcript they can actually file or quote from. It works for in-person recordings too, with one extra step at the front.

Is an AI transcript admissible in court?

Short answer: the recording is the evidence, not the transcript. The transcript is an aid. For a deposition that will be used at trial, the official record almost always has to come from a certified court reporter or a Certified Electronic Reporter, depending on the jurisdiction. The Federal Rules of Civil Procedure (Rule 30(b)(5)) require the officer to certify the transcript.

Where AI transcription earns its keep is everywhere upstream of that: the attorney's working copy, the deposition summary, the impeachment binder, the searchable index across forty hours of testimony. That's the use case this post covers. If you need an official certified transcript for filing, hire a reporter.

What audio format should I use?

If you recorded in Zoom, you already have an M4A audio file and an MP4 video file in your local recordings folder. Use the M4A. It's smaller, the audio is identical, and most transcription tools accept it directly.

If you recorded in person, get the file off the device as a WAV or a high-bitrate MP3 (192 kbps or higher). Avoid voice-memo formats that compress aggressively. The single biggest accuracy lever is the input, and we've written about that in detail in best practices for audio quality before transcribing.

One deposition-specific tip: if the recording has separate channels for the deponent and counsel (sometimes the case with professional setups), keep them separate. Diarization works dramatically better on multi-track audio than on a single mixed channel.

How do I get accurate speaker labels?

Depositions have a predictable cast: the deponent, the examining attorney, defending counsel, sometimes a second attorney, sometimes a videographer. Five voices, maximum, and they take long turns. That's the easy case for speaker diarization.

What trips up automated diarization is overlapping speech, which depositions are full of. Objections get fired off mid-answer. Counsel interrupts to instruct the witness. The court reporter would say "one at a time, please." Your software just gets confused.

Two things that help:

Identify each speaker once, at the top.

Most tools label speakers as Speaker 1, Speaker 2, etc. Do a five-minute pass at the start and rename them: "Ms. Alvarez (Plaintiff's Counsel)," "Witness," "Mr. Chen (Defense)." Find-and-replace handles the rest.

Trust the structure, not the labels.

Q&A in a deposition follows a rigid pattern: question from counsel, answer from witness, occasional objection. If a line breaks that pattern, look at it. It's usually a misattribution.

For a deeper dive into what makes multi-speaker transcription hard, transcribing a Zoom recording with multiple speakers covers the same diarization quirks in a less adversarial setting.

Do I need timestamps in a deposition transcript?

Yes, and the granularity matters. Page-and-line citations are the standard format lawyers cite in motions ("Smith Depo. 42:13–18"), but you don't have page numbers from an AI tool. What you have is timestamps, and they're actually more useful for the working copy because they link straight back to the audio.

Use word-level or sentence-level timestamps if your tool supports them. They let you click a quote and hear it. That single feature has caught more misheard words in my workflow than any other QA step, because the ear catches what the eye glosses over.

We've written about why granularity matters in getting the most out of timestamped transcripts if you want the longer argument.

How long does a deposition transcript take?

A seven-hour deposition is roughly a 50,000-word transcript. A good AI tool will produce the first draft in 10 to 30 minutes depending on the model. The review pass is where the real time goes: plan on one hour of review for every two hours of testimony if accuracy matters, more if there's specialized vocabulary (medical depositions, patent cases, anything with proper nouns).

That's still an order of magnitude faster than transcribing by hand, which the Bureau of Labor Statistics notes is a specialized skill that takes years to develop.

To run a deposition through quickly, you can upload the file directly and have a draft transcript with speaker labels and timestamps in well under an hour. From there, the work is human review.

What should the review pass actually check?

This is the part that separates a useful working transcript from a liability. Don't just skim. Make a checklist and go through it:

The five things to verify on every deposition transcript:

Numbers and dates. "Fifteen" versus "fifty" is a common ASR error and a huge factual one. Anywhere a number appears, listen to the audio.
Proper nouns. Names of people, companies, exhibits, locations. AI tools spell what they hear, and they hear "Synova" as "Cinnabar."
Negations. Did the witness say "I did" or "I didn't"? "I can" or "I can't"? These are the highest-stakes errors.
Objections. Make sure every "Objection, form" or "Objection, foundation" is captured and attributed to the right attorney.
Off-the-record exchanges. When counsel goes off-record, the audio sometimes keeps rolling. Mark those sections clearly so they don't get quoted.

How do I handle confidential or privileged material?

Depositions routinely contain protected health information, trade secrets, and material covered by protective orders. If you're using a cloud transcription service, read the data processing terms before you upload. Look for: data retention policy, whether your audio is used to train models, where the data is processed, and whether the vendor will sign a BAA if you need one for HIPAA.

For highly sensitive matters, a local transcription pipeline using something like faster-whisper running on your own machine is worth the setup cost. The audio never leaves your laptop. We compared the tradeoffs in faster-whisper vs OpenAI whisper.

Try it now — it's free

Transcribe your video with Ask Giya

Paste any public link or upload a file and get a clean transcript in minutes. First 3 clips every month are on us — no card required.

Start transcribing No subscription · 8¢/min after free clips

What about exhibits and read-backs?

Exhibits introduced during the deposition are referenced by number ("Exhibit 7"), and the witness often reads passages aloud. Two things to do in review:

First, when an exhibit is marked, insert a line like [EXHIBIT 7 MARKED] in brackets so the transcript reflects the procedural moment. Second, when the witness reads from a document, distinguish read-aloud text from testimony. Indented quotes work well, or a tag like [reading].

These conventions aren't standardized across firms, but pick a system and apply it consistently across the matter. The deposition summary you produce next will thank you.

The honest bottom line

For a working transcript that an attorney can search, summarize, quote in a memo, and use to prep for trial, modern AI transcription is genuinely good. Word error rates on clean deposition audio sit in the 3 to 8 percent range depending on the tool and the speakers. That's workable with a real review pass.

For the certified transcript that gets filed with the court, hire a reporter. The two workflows complement each other: the reporter's record is the official one, and your AI-generated working copy is what you actually use day to day. That's been the pattern at every litigation team I've worked with, and it's the one I'd recommend.