Launching on Virtuals

Test AI agents before they spend your money

AgentEval benchmarks commerce agents for price accuracy, budget compliance, x402 payments, and safety. Get a trust score before you trust.

Follow for launch

The Problem

Agents are getting wallets. Who's checking if they're safe?

AI agents are making purchases, negotiating deals, and handling payments. But there's no standard way to know if they actually follow your instructions.

Budget violations

You set a $100 limit. The agent spends $250. Who's responsible?

Price inaccuracy

Agent says it found the best deal. Did it actually compare prices?

Payment errors

x402 is the new standard. Does your agent handle it correctly?

No accountability

Agents ship without testing. Users have no way to compare trust.

How It Works

Simple evaluation. Clear results.

Submit

Provide agent endpoint, output, or mock data

Test

We run commerce scenarios with real constraints

Score

Get a Commerce IQ score and detailed breakdown

Report

Download PDF with certifications and findings

What We Test

Comprehensive commerce evaluation

💰

Price Accuracy

Does it actually find the best prices?

📊

Budget Compliance

Does it respect spending limits?

🔐

x402 Correctness

Proper payment protocol handling

🤝

Negotiation Quality

How well does it negotiate deals?

🛡️

Safety

Protection against unauthorized spends

📋

ACP Compliance

Follows agent protocol standards