Skip to main content

How Voyc Delivers Accurate Transcripts

Joyc from Voyc avatar
Written by Joyc from Voyc
Updated this week

Our Promise to You

We know transcripts need to be reliable. If words are missed or misheard, it can change the meaning of a whole conversation. That’s why Voyc is built a on custom Deepgram Nova speech-to-text model, among the most advanced and deepest trained in the world.

We can’t promise 100 % perfection - no provider can. But we do promise:

  • We use top-tier technology that consistently reduces errors compared to alternatives.

  • We build on the deepest-trained model on the market and adapt it with in-domain data, making it more accurate for the voices, accents and topics our customers use every day.

  • We standardise all audio into a unified format before it is transcribed to improve consistency.

  • We support custom vocabulary for your brand, products and acronyms.

  • We use confidence scoring, so you know where to focus.

  • We are always improving, using your feedback to guide where accuracy can be pushed further.

What is WER?

Transcription accuracy is usually measured by Word Error Rate (WER), the percentage of words a system gets wrong. Lower is better. Even a small drop in WER can mean a big jump in readability.

Why Deepgram

Independent benchmarks published by Deepgram show just how well Nova performs compared to other leading providers.

Source: Deepgram, “How do you evaluate performance of a speech-to-text API?https://deepgram.com/learn/best-speech-to-text-apis

In these results, Deepgram shows how their Nova models consistently outperform other leading providers by a wide margin. That’s why Voyc chooses them as our transcription partner.

📝 A note on benchmarks: WER is a useful way to compare systems, but results vary with real-world conditions like accents, jargon and background noise. Benchmarks are an indicator of relative performance, not a guarantee of exact accuracy in every call.

How Does it Compare to Humans?

For decades, transcription AI has aimed for human-level accuracy as the benchmark. Professional human transcribers typically achieve around 4 % WER under good conditions. Our custom Nova model is already operating very close to that standard, showing how close automatic speech recognition technology has come to the long-standing goal of human-level accuracy

Where AI clearly outpaces humans is speed. A professional transcriber may take hours to process a single hour of audio, but Voyc delivers transcripts in near real time. That means you get human-level accuracy targets combined with machine-level speed and scalability.

Why Accuracy Can Vary

Real conversations are messy. Crosstalk, muffled microphones, strong accents, jargon and background noise all affect how any system performs and that is true for every provider, not just Voyc. In technical terms, this is why WER can shift depending on audio quality, speaker accents or domain-specific language. Benchmarks give a useful indication of model performance, but what matters most is how the system handles your calls.

Working Together

If you ever spot transcripts that don’t meet expectations, please tag them as “transcription is still not as expected”, leave a comment on the conversation and let us know via the help chat bot. They help us understand how our custom models perform in your environment and where to focus improvements. Accuracy is a journey and we are on it with you.

The Gist

Voyc delivers transcripts that are fast, accurate and built for the real world. Powered by Deepgram’s Nova models, one of the best speech-to-text systems in the world, our technology is trained on millions of real conversations and fine-tuned for your industry. It achieves near human-level accuracy and delivers results in a fraction of the time, giving you the clarity you need without the wait.

Did this answer your question?