AssayAssay your AI. Ship with confidence.

The TypeScript-native LLM evaluation framework that fits into your existing test suite.

18 Metrics

Research-backed evaluation metrics for RAG, agentic, conversational, and safety use cases — all out of the box.

Run evaluations inside the test runner you already use with familiar describe/it/expect patterns.

Full type safety from test case to metric result. Zod-validated configs, zero any.

Works with OpenAI, Anthropic, Google Gemini, Azure OpenAI, and Ollama. Bring your own provider too.

Run evaluations from the command line with assay run, scaffold configs with assay init, and list available metrics.

Pipe Vercel AI SDK generateText and streamText results straight into evaluation with zero boilerplate.