Found something good?

Save it before you doomscroll past it.

🚨 BREAKING: Someone just open sourced the missing layer for AI agents and it's genuinely insane.

It's called LangWatch. The complete platform for LLM evaluation and AI agent testing trace, evaluate, simulate, and monitor your agents end-to-end before a single user sees them.

Here's what you actually get:

→ End-to-end agent simulations - run full-stack scenarios (tools, state, user simulator, judge) and pinpoint exactly where your agent breaks, decision by decision
→ Closed eval loop - Trace → Dataset → Evaluate → Optimize prompts → Re-test. Zero glue code, zero tool sprawl
→ Optimization Studio - iterate on prompts and models with real eval data backing every change
→ Annotations & queues - let domain experts label edge cases, catch failures your evals miss
→ GitHub integration - prompt versions live in Git, linked directly to traces

Here's the wild part:

It's OpenTelemetry-native. Framework-agnostic. Works with LangChain, LangGraph, CrewAI, Vercel AI SDK, Mastra, Google ADK. Model-agnostic too OpenAI, Anthropic, Azure, AWS, Groq, Ollama.

Most teams shipping AI agents have zero regression testing. No simulations. No systematic eval loop.

They find out their agent broke when a user tweets about it.

LangWatch fixes that. One docker compose command to self-host.

Full MCP support for Claude Desktop. ISO 27001 certified.

100% Open Source.

(Link in the comments)

Tweet media
5511570768.5K
Keep it forever

Create a free account to save everything you preview — private to you.

Preview another link

Works with X, Instagram, TikTok & YouTube.

One place for everything
Tweets, TikToks, Reels, Shorts & articles in one searchable home.
Media at your fingertips
Full-screen viewer for photos and video — save any post to your collection.
Actually find it later
Full-text search across everything you save.