TechCrunch: Microsoft releases ASSERT, an open-source AI behavior testing framework
Microsoft has released ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), an open-source framework that converts plain-language behavioral requirements into automated test cases for AI applications. Teams describe what their application should and should not do — for example, that a document research tool should not email external contacts, should restrict confidential content to executives, and should produce concise summaries — and ASSERT generates, runs, and scores test cases against those requirements.
The tool records AI system decision paths, including intermediate actions and tool calls, making it easier to investigate why a test failed. It can run during development, after deployment, and as an ongoing production monitor.
The motivation is a gap that product teams building AI features encounter regularly: broad AI benchmarks measure model capability, not whether a specific application behaves correctly within its specific constraints. A customer support agent that uses the same underlying model as a competitor may behave very differently once tone guidelines, escalation rules, and data-access restrictions are applied. ASSERT addresses this by letting teams define their own behavioral specifications in the same natural language they already use for requirements.
For product managers, the practical implication is a way to turn behavioral requirements — the kind typically written in a PRD — into executable tests. This work has historically required engineering effort or was skipped entirely in favor of qualitative evaluation. The tool is available as open source on GitHub.