In December 2025, Anthropic set up a unique test. It built a used-goods marketplace for staff at its San Francisco office, handed each person an AI agent trained on what they wanted to sell and buy, and told the bots to negotiate with each other. No human could intervene or approve deals.
In one week, 69 employees’ agents struck 186 real transactions worth over $4,000. Goods changed hands physically at a swap party afterwards. The experiment, called Project Deal, attempted to answer a straightforward question: Can AI agents do bargaining at all, and does the quality of the model behind the agent change the outcome?
The secret model swap
Anthropic ran four versions of the market at once. Two used Claude Opus 4.5, the frontier model at the time, for every agent. Two others quietly assigned half the participants Claude Haiku 4.5, a smaller, less capable model. Staff did not know which run was real or which model they had until everything was over.
The gap was immediate. Opus-powered agents completed about two more deals per user on average. They sold the same items for more money, fetching an extra $3.64 per item compared to Haiku. They also paid less when buying, saving roughly $2.45 per transaction. In one case, the exact same broken folding bike went for $38 when Haiku handled the sale and $65 when Opus took over. A lab-grown ruby sold for $35 through Haiku and $65 through Opus.
Nobody noticed they lost out
After the market closed, staff rated how fair their deals felt. The scores clustered around the middle of the scale regardless of which model had represented them. Even when asked to rank their bundles of goods from best to worst across different runs, 11 of the 28 people who experienced both models actually preferred the one where they had the weaker agent. The disadvantage was measurable but imperceptible to the people on the wrong end of it.
Some employees gave their agents specific bargaining instructions. One asked the bot to negotiate hard and lowball. Another wanted it to talk like an exhausted cowboy down on his luck, a character the model played with full commitment. None of these tactics shifted sale prices or purchase savings in a statistically meaningful way. Model quality, not negotiation style, decided who came out ahead.
Hiccups
The market also coughed up moments no one planned. One agent accidentally bought its human a snowboard identical to one he already owned. Another, instructed to buy a gift for itself, spent $3 on a bag of ping-pong balls and called them “19 perfectly spherical orbs of possibility.” A different pair of agents spent a surprisingly long negotiation arranging a free afternoon of dog-sitting, complete with confabulated details, that the humans and the dog later followed through on.
Anthropic is careful not to overstate the findings. This was a small, self-selected group of volunteers armed with $100 budgets. But the company believes agent-to-agent commerce is closer than most people think, and that the quiet advantage enjoyed by stronger models could take root in real markets without anyone noticing. The policy frameworks to handle such a world do not yet exist, after all.