An AI-vs-AI debate tool to surface strong arguments and test LLM bias

learningThroughDebate

I’ve built a small project aimed to explore questions through adversarial debate between LLMs.

The core idea:

Two models are randomly assigned opposing sides of a controversial proposition (e.g. "X should be legal").
The user votes on the proposition before the debate, reads both arguments, and then votes again.
Only afterward does the system reveal which model argued which side.

Content is often one-sided argumentation with a steelman for one view, and a strawman for the other. This tool tries to surface strong, opposing arguments side-by-side, ideally giving users the best arguments from each side.

With enough usage, I want to use it to benchmark LLMs by how often they can change minds, with the assumption that LLMs with biases will do worse when they’re randomly assigned to argue different sides of debates.

It currently uses GPT-4o, Gemini Flash 2.5, and Grok-3. It’s early-stage, and a bit verbose/rhetorical at times, but I’d appreciate feedback from people who care about epistemic clarity and dialectical reasoning.

Link: https://bot-bicker.vercel.app/
Would love thoughts on:

The underlying premise (AI debates as a path to truth-seeking)
Design flaws or ways to better surface useful disagreements
Anything you think would make this more valuable as an epistemic tool

Effective Altruism Forum
EA Forum

An AI-vs-AI debate tool to surface strong arguments and test LLM bias

4

4

Reactions