
What is AutoArena?
AutoArena is an open-source tool designed to automate head-to-head evaluations of GenAI systems using LLM judges. It allows users to quickly and accurately generate leaderboards comparing different LLMs, RAG setups, or prompt variations. Users can fine-tune custom judges to fit their specific needs. AutoArena facilitates trustworthy evaluation of LLMs, RAG systems, and generative AI applications through automated head-to-head judgement.
How to use AutoArena?
Install AutoArena locally using `pip install autoarena`. Define your inputs (user prompts) and outputs (model responses) from your Generative AI system. Then, use the tool to run head-to-head evaluations with LLM judges to rank your systems. Collaborate with team members on AutoArena Cloud at autoarena.app.
AutoArena’s Core Features
Automated head-to-head evaluation using LLM judges Leaderboard generation for comparing LLMs, RAG setups, and prompt variations Fine-tuning of custom judges Elo score and Confidence Interval computation Integration with GitHub for CI/CD Parallelization, randomization, and rate limiting handling
AutoArena’s Use Cases
Relevant Navigation


Vilosia

Trag

Vivas.AI

Use of English AI

Thunder.vc

Squire AI
