I’ve built a small project aimed to explore questions through adversarial debate between LLMs.

The core idea:

  • Two models are randomly assigned opposing sides of a controversial proposition (e.g. "X should be legal").
  • The user votes on the proposition before the debate, reads both arguments, and then votes again.
  • Only afterward does the system reveal which model argued which side.

Content is often one-sided argumentation with a steelman for one view, and a strawman for the other.  This tool tries to surface strong, opposing arguments side-by-side, ideally giving users the best arguments from each side.

With enough usage, I want to use it to benchmark LLMs by how often they can change minds, with the assumption that LLMs with biases will do worse when they’re randomly assigned to argue different sides of debates.

It currently uses GPT-4o, Gemini Flash 2.5, and Grok-3. It’s early-stage, and a bit verbose/rhetorical at times, but I’d appreciate feedback from people who care about epistemic clarity and dialectical reasoning.

Link: https://bot-bicker.vercel.app/ 
Would love thoughts on:

  • The underlying premise (AI debates as a path to truth-seeking)
  • Design flaws or ways to better surface useful disagreements
  • Anything you think would make this more valuable as an epistemic tool

4

0
0

Reactions

0
0
Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities