Qualitative research lives or dies on the quality of your transcripts. A good NVivo project starts with text that's accurate, properly attributed to speakers, and easy to move around. A messy transcript turns coding into a tax: you spend the first hour fixing speaker labels and the next two arguing with yourself about whether a paragraph break is a meaning unit or a fluke.

This is for researchers who already use NVivo (or are about to) and want a workflow that doesn't make them dread the import step.

Key takeaways
  • Format .docx with one speaker turn per paragraph and consistent labels before you import.
  • Use project-wide participant IDs (P01, P02), not file-local Speaker 1.
  • Edit AI transcripts before coding. AI mishears words you'll then code as evidence.
  • Strip timestamps from the speaker label or NVivo's auto-code will fragment your Cases.

What "coding" means in NVivo, briefly

Coding is the act of attaching one or more labels to a passage of text. In NVivo's vocabulary those labels are called Nodes. The Node is your theme, code, or concept. The highlighted text is the evidence. Over a project you build up a hierarchy of Nodes (parent themes with child sub-themes), then run queries to see who said what about each.

NVivo doesn't care whether your transcript came from a human typist or a machine. It cares about formatting: paragraphs, speaker labels, and whether you're treating speakers as Cases.

The transcript format NVivo actually wants

The path of least resistance is plain .docx. NVivo pulls metadata out of a Word document if you format speaker turns consistently, then the Auto Code feature splits the transcript by speaker into separate Cases. That's what most researchers want for comparison queries later.

A few rules that save hours:

Sounds fussy. It is. Fixing this at the transcription stage costs ten minutes. Fixing it after you've already coded a hundred passages costs you a weekend.

From recording to coded data: the full workflow

The path most researchers I know actually follow:

1
Record clean audio.

A lavalier or boundary mic, a quiet room, consent paperwork on file. The interview recording checklist covers the small things that make a big accuracy difference downstream.

2
Get a first-pass transcript fast.

A 60-minute interview that took a human typist four hours now takes about three minutes and costs a few dollars. You can transcribe an interview file in the background while you make coffee.

3
Edit for verbatim accuracy.

Listen along with the audio and clean up what your tool got wrong: names, jargon, overlapping speech. Decide upfront whether you want strict verbatim (every "um", every false start) or intelligent verbatim (the meaning, cleaned up). For most qualitative analysis, intelligent is enough. For conversation analysis or linguistic work, you need strict. See the verbatim vs intelligent decision for the trade-offs.

4
Standardize speaker labels.

Find-and-replace through every file so Speaker 1 becomes INT, Speaker 2 becomes P01, and so on. Use participant IDs that are stable across the whole project.

5
Import into NVivo.

File → Import → Files. Select your .docx set.

6
Auto Code by speaker.

Right-click the file → Auto Code → Speaker name pattern. NVivo creates a Case for each speaker. Now your queries can ask "what did P03 say about consent?" without you doing anything else.

7
Code thematically.

Open a transcript, highlight a passage, drag it to a Node. Build the codebook as you go. Memo as you code. The memo is where the insight actually lives.

Skip step 4 and step 6 falls apart. You'll be stuck merging Cases by hand for hours.

Coding speaker turns vs. coding meaning units

Two camps. One says the analytical unit is the speaker turn: whatever the participant said before the interviewer spoke again. The other says it's the meaning unit, a phrase or sentence or paragraph that expresses one idea, even if it's a fragment of a longer turn.

Neither is wrong. Turn-based is faster and gives cleaner queries by speaker. Meaning-unit is slower but gives you finer-grained analysis with more decisions per page.

If your research question is "how do these participants frame X?", code meaning units. If it's "do these two groups talk about X differently?", turns are fine.

NVivo handles both. Just be consistent within a project. Half turn-level, half sentence-level, and your frequency counts mean nothing.

Common mistakes when coding AI-generated transcripts

A few patterns that cost researchers time:

NVivo vs. Atlas.ti vs. MAXQDA: does the transcript work the same?

Mostly yes. All three CAQDAS tools accept .docx and .txt. All three have a speaker auto-code feature. All three handle thematic and case-based coding.

Differences worth knowing:

In all three, the transcript quality determines the analysis quality. There's no software workaround for a transcript where the speakers are mislabeled.

A note on IRB and consent

If your IRB requires participants to consent to AI processing of their audio, your consent form needs to say so explicitly. The interview consent form templates post has language you can adapt. Keep the original audio under access controls. Don't upload to tools that retain your data for training.

Try it now — it's free
Transcribe your video with Ask Giya

Paste any public link or upload a file and get a clean transcript in minutes. First 3 clips every month are on us — no card required.

Start transcribing No subscription · 8¢/min after free clips

The transcript is the foundation. If the bottom is uneven, every code, every theme, every quote you pull for the methods chapter is uneven too. A clean import is worth the hour it costs you.

Sources