Speechify launches a Windows app with on-device AI transcription and dictation

On March 31, 2026, Speechify launched a native Windows application that runs its core AI features — text-to-speech, voice activity detection, and Whisper-based transcription — entirely on-device, without sending audio or text to Speechify’s servers.

The app uses three locally stored models: a neural text-to-speech model (VITS Neural) for reading documents and articles aloud, the open-source Silero model for real-time voice activity detection (detecting when the user starts speaking), and an on-device version of OpenAI’s Whisper for transcription and dictation. On compatible hardware, these run without cloud involvement. On Copilot+ PCs — Windows devices with neural processing units from AMD, Intel, or Qualcomm — the models run entirely on the NPU. Intel and AMD GPU-equipped Windows 11 machines are also supported, though users can optionally switch to cloud-based processing or change models mid-session.

The privacy implication is direct for writers and journalists working with sensitive source material: interview recordings, unpublished documents, and draft manuscripts processed through Speechify’s Windows app do not leave the device. This is a meaningful distinction from browser-based or cloud-dependent tools in the same category. Speechify launched meeting transcription in February 2026, but that feature was browser-only; the native app extends transcription and dictation to any Windows application.

Speechify describes the app as positioning it as a “full-stack voice app” alongside its existing text-to-speech focus — covering reading, dictation, and transcription rather than only output. The company has over 50 million users.

The app is available on the Microsoft Store. Competitors in the dictation and transcription category include Wispr Flow, Willow, and Superwhisper.