Apple vs Whisper vs Parakeet: We Tested 4 Speech-to-Text Engines on 13,000 Recordings

We benchmarked the four speech-to-text engines that ship inside Dictato — Apple SpeechAnalyzer, WhisperKit, Parakeet and Parakeet+Qwen3 — on 13,000 real recordings in 5 languages. Here's which one wins your situation.

If you’ve ever searched for the best dictation app for Mac, you’ve probably read a dozen reviews that all sound the same. Someone tries each app for ten minutes, dictates the same paragraph, declares a winner, and moves on.

That tells you almost nothing.

No speech-to-text engine wins everywhere. The model that nails clean read-aloud audio falls apart on a strong accent. The one that handles fillers like a champion mangles medical terminology. The one with the best published Word Error Rate quietly returns empty strings on 7% of real-world clips, and the only way you’d know is if you measured it.

So we measured it. We benchmarked the four speech-to-text engines that ship inside Dictato on 13,000 audio samples, 5 languages, and 7 situations that actually break dictation in real life. Same engines you get when you install the app — same audio, same metrics. Here’s what we found.

The four engines we ship in Dictato

These aren’t four engines we hand-picked for a blog post. They’re the four engines that come with Dictato, and you can switch between them in one click. Every one of them runs fully on-device on Apple Silicon — nothing leaves your machine.

  • Apple SpeechAnalyzer — the new on-device engine that ships with macOS 26. Apple’s first serious answer to Whisper.
  • WhisperKit — OpenAI’s Whisper, ported to run natively on Apple Silicon.
  • Parakeet (FluidAudio) — NVIDIA’s Parakeet model, optimized for the Apple Neural Engine.
  • Parakeet + Qwen3 proofread — Parakeet output, then refined by an on-device language model running through Apple Foundation Models.

We picked these four because together they cover what a serious offline dictation app on Mac can realistically ship in 2026. The benchmark below is exactly the test we run on every release of Dictato to decide what’s good enough to ship.

How we tested

Most reviews hand-pick a few sentences. We did the opposite.

  • 13,023 audio samples drawn from three different sources
  • 5 languages: English, French, Spanish, German, Italian
  • 7 situations: clean read-aloud, accented English, disfluent speech (fillers, restarts), technical jargon (medical, scientific), longform clips over 30 seconds, brand-name vocabulary, and proper nouns
  • ~7 hours of runtime per full benchmark pass

Each engine transcribed every sample. We compared the output to the human-written reference and computed Word Error Rate — the percentage of words the engine got wrong. Lower is better. Zero is perfect.

If you want a broader Mac dictation comparison, we have 9 apps tested side-by-side. What’s below is engine-level: the speech recognition cores under the hood.

Result 1: Apple SpeechAnalyzer wins clean speech in 4 of 5 languages

Read aloud from a script — calm pace, no interruptions — and Apple’s new SpeechAnalyzer is the most accurate engine across French, Spanish, German and Italian.

LanguageBest engineWord Error Rate
EnglishWhisperKit5.2%
FrenchApple SpeechAnalyzer7.3%
SpanishWhisperKit4.5%
GermanApple SpeechAnalyzer6.7%
ItalianApple SpeechAnalyzer4.0%

This was a surprise. Apple was a punchline in speech recognition for years. With macOS 26, that has quietly stopped being true. The Italian result — 4.0% WER — is the best number we’ve seen from any on-device engine on any platform.

WhisperKit still leads English and Spanish, but the gap is small and Apple is closing it fast.

Result 2: Parakeet wins when people actually talk

Clean read-aloud is a useful benchmark, but it’s not how anyone really dictates. Real speech has “um”s, restarts, “I mean, what I meant was”, false starts, and pauses.

On disfluent speech, Parakeet wins in 3 of 5 languages. It’s tuned to drop fillers and reconstruct sentences cleanly, which is exactly what you want when you’re thinking out loud.

If you dictate the way you talk — with hesitations, restarts, and the occasional “wait, scratch that” — Parakeet is your engine.

Result 3: Add an LLM proofread for technical jargon

Parakeet has one weakness, and so does Whisper, and so does Apple: technical and medical terminology. The terms are too rare in the training data and the models guess phonetic neighbors instead.

The fix is to pipe the raw transcription through an on-device language model that knows the words. We tested Parakeet + Qwen3 (running through Apple Foundation Models on macOS 26).

The drop is dramatic. Word Error Rate on jargon falls by roughly half across all 5 languages — from around 20% with raw Parakeet to around 10% with the proofread layer.

If you dictate medical notes, legal terminology, scientific papers, or anything with specialized vocabulary, you want the proofread step on. It’s free quality if your Mac has the cycles to spare, and modern M-series chips usually do.

Result 4: Brand names and proper nouns need vocabulary boosting

“OpenAI.” “Anthropic.” “Sundar Pichai.” Your client’s company name. Your colleague’s last name. None of these are in any model’s training data.

Generic transcription mangles them every time. The fix is vocabulary boosting: you give the engine a list of words to expect, and it biases its predictions toward those words.

WhisperKit’s prompt-biasing approach was the most effective in our tests — closest to 100% recall on brand names. Apple and Parakeet are weaker here today, though that should change as the underlying APIs mature.

In Dictato, you can set a custom vocabulary list per context (medical, legal, your company), and the active engine picks it up automatically.

So what do you actually pick?

The honest answer: it depends on what you dictate.

If your usage is uniform — only Slack messages, only blog drafts, only meeting notes — pick the engine that wins your category and stick with it. You’ll get the best result for your situation and you’ll never have to think about it again.

If your usage is mixed, you want an app that switches engines based on context. A clinical note wants Parakeet + Qwen3. A quick email with company names wants WhisperKit with a vocabulary list. A long voice memo wants Apple SpeechAnalyzer.

This is what Dictato does. All four engines ship in the app. Pick a default, or let it adapt. No subscription. No cloud. The benchmark above is what we use internally to decide what to ship.

Why this kind of testing matters

A 10-minute review tells you which app felt fastest. A real benchmark tells you which app actually returns text that matches what you said, across the situations you actually encounter.

We rerun this benchmark on every release. When a fix lands, we know whether quality improved or regressed — in numbers, across thousands of recordings. Not just on the one sentence we happened to test.

That’s the difference between trust me bro and here’s the data.

If you want to see how this translates to a working dictation app on your Mac, Dictato ships all four engines, runs offline, and uses this exact benchmark to decide what to release.


Want real-time dictation that actually works on your Mac? Dictato brings four speech-to-text engines, vocabulary boosting, and 80ms latency to any app. 100% offline. No subscription. Try it free for 7 days.