Hasan Toor

@hasantoxr

🚨 BREAKING: Someone just open sourced the missing layer for AI agents and it's genuinely insane.

It's called LangWatch. The complete platform for LLM evaluation and AI agent testing trace, evaluate, simulate, and monitor your agents end-to-end before a single user sees them.

Here's what you actually get:

→ End-to-end agent simulations - run full-stack scenarios (tools, state, user simulator, judge) and pinpoint exactly where your agent breaks, decision by decision
→ Closed eval loop - Trace → Dataset → Evaluate → Optimize prompts → Re-test. Zero glue code, zero tool sprawl
→ Optimization Studio - iterate on prompts and models with real eval data backing every change
→ Annotations & queues - let domain experts label edge cases, catch failures your evals miss
→ GitHub integration - prompt versions live in Git, linked directly to traces

Here's the wild part:

It's OpenTelemetry-native. Framework-agnostic. Works with LangChain, LangGraph, CrewAI, Vercel AI SDK, Mastra, Google ADK. Model-agnostic too OpenAI, Anthropic, Azure, AWS, Groq, Ollama.

Most teams shipping AI agents have zero regression testing. No simulations. No systematic eval loop.

They find out their agent broke when a user tweets about it.

LangWatch fixes that. One docker compose command to self-host.

Full MCP support for Claude Desktop. ISO 27001 certified.

100% Open Source.

(Link in the comments)

Found something good?