AgentEval benchmarks commerce agents for price accuracy, budget compliance, x402 payments, and safety. Get a trust score before you trust.
Video coming soon
The Problem
AI agents are making purchases, negotiating deals, and handling payments. But there's no standard way to know if they actually follow your instructions.
You set a $100 limit. The agent spends $250. Who's responsible?
Agent says it found the best deal. Did it actually compare prices?
x402 is the new standard. Does your agent handle it correctly?
Agents ship without testing. Users have no way to compare trust.
How It Works
Provide agent endpoint, output, or mock data
We run commerce scenarios with real constraints
Get a Commerce IQ score and detailed breakdown
Download PDF with certifications and findings
What We Test
Does it actually find the best prices?
Does it respect spending limits?
Proper payment protocol handling
How well does it negotiate deals?
Protection against unauthorized spends
Follows agent protocol standards