Related skills
machine learning benchmarking model evaluation bradley-terry-luce-ranking confidence intervals📋 Description
- Own Arena’s scientific communications strategy, clarifying methods and data quality externally.
- Lead Arena’s data quality narrative with transparency and evidence.
- Develop canonical explanations of measurement, incl. ranking, CIs, and uncertainty.
- Ensure leaderboards are communicated responsibly; treat small differences as noise.
- Anticipate and respond to critiques like contamination and overfitting.
- Partner with researchers to translate technical work into public materials.
🎯 Requirements
- 8-10 years in AI/ML, evaluation, or scientific communications.
- Strong ML benchmarking or evaluation background with credibility to engage labs.
- Exceptional writing and communication; explain complex methodology clearly.
- Track record of rigorous external-facing work (papers, reports, docs).
- Comfort operating in ambiguity; communicate uncertainty transparently.
- High editorial judgment; identify where nuance is misunderstood.
- Collaborative across research, product, policy, and communications teams.
🎁 Benefits
- Competitive compensation and equity aligned to team location.
- Comprehensive health, dental, vision, and wellness benefits.
- Opportunity to work on cutting-edge AI with a mission-driven team.
- Culture of transparency, trust, and community impact.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Content Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!