Test your model
Logo

Leaderboards and benchmarks

Your organization is preparing to deploy AI agents for client interactions or process automation.

We assist you in conducting a behavioral assessment of your AI agents.

LLM Bias Leaderboard

Explore our LLM Bias Leaderboard, a benchmark assessing biases in state-of-the-art LLMs across sensitive categories and multiple languages.


👐 Fairness   🛡 Robustness  

LLM Luxembourgish Leaderboard

Consult our LLM Luxembourgish Leaderboard, a benchmark evaluating the proficiency of state-of-the-art LLMs in the Luxembourgish language.


👐 Fairness   🛡 Robustness  

LLM Cooperativeness Benchmark

The level of cooperativeness of the LLM underlying AI agents affects performance in complex tasks that demand effective coordination. Discover our cooperativeness benchmark.


🔎 Human Oversight 👐 Fairness   🛡 Robustness