TechCrunch: Microsoft AI releases three foundational models for text, voice, and images
Microsoft’s AI division (MAI), led by Mustafa Suleyman, released three foundational models on April 2, 2026. The models cover speech transcription, audio generation, and image generation — areas where Microsoft had largely relied on third-party providers, including OpenAI, until now.
The three models
MAI-Transcribe-1 converts spoken language into text across 25 languages. It is positioned as a speech-to-text solution for enterprise applications and developer integrations.
MAI-Voice-1 generates audio from text input, producing up to 60 seconds of audio per second. It supports custom voice creation, which enables applications to generate consistent synthetic voices tuned to a specific brand or persona.
The third model handles image generation, and had already been available via MAI Playground since March 19 before the formal announcement.
Context
MAI was formed approximately six months before this announcement as a distinct unit within Microsoft, separate from the products team that ships Copilot and Microsoft 365 AI features. The division’s mandate appears to be building Microsoft’s own foundational model stack rather than depending on OpenAI models through the existing partnership.
The timing matters: Microsoft’s deal with OpenAI has evolved as OpenAI has grown more independent and competitive in enterprise markets. Building in-house multimodal capability gives Microsoft more flexibility in how it prices, deploys, and differentiates its AI products over time.
Why it matters for product managers
For PMs evaluating enterprise AI tools, the release adds Microsoft to the list of providers offering proprietary speech and image generation capabilities alongside OpenAI, Google, Anthropic, and ElevenLabs. Teams building on Azure or Microsoft 365 may gain access to these models without switching vendors.
The broader pattern is worth watching: the major platform companies are each moving to own the full stack from foundational models to end-user applications. For product managers building on top of these platforms, that consolidation affects API pricing, model availability, and the strategic hold any single vendor has over their product.