Launching on Virtuals

Test AI agents before they spend your money

AgentEval benchmarks commerce agents for price accuracy, budget compliance, x402 payments, and safety. Get a trust score before you trust.

Video coming soon

Agents are getting wallets. Who's checking if they're safe?

AI agents are making purchases, negotiating deals, and handling payments. But there's no standard way to know if they actually follow your instructions.

Budget violations

You set a $100 limit. The agent spends $250. Who's responsible?

Price inaccuracy

Agent says it found the best deal. Did it actually compare prices?

Payment errors

x402 is the new standard. Does your agent handle it correctly?

No accountability

Agents ship without testing. Users have no way to compare trust.

Simple evaluation. Clear results.

1

Submit

Provide agent endpoint, output, or mock data

2

Test

We run commerce scenarios with real constraints

3

Score

Get a Commerce IQ score and detailed breakdown

4

Report

Download PDF with certifications and findings

Comprehensive commerce evaluation

💰

Price Accuracy

Does it actually find the best prices?

📊

Budget Compliance

Does it respect spending limits?

🔐

x402 Correctness

Proper payment protocol handling

🤝

Negotiation Quality

How well does it negotiate deals?

🛡️

Safety

Protection against unauthorized spends

📋

ACP Compliance

Follows agent protocol standards

The trust layer for AI agents

AgentEval is launching on Virtuals. Follow for updates.