Designing and building AI evaluation system: datasets, replay, scorers, dashboards.
Designing feedback loops from real usage: collect and interpret user signals.
Developing tooling for debugging agent behavior: failures and insights.
Improving reliability by defining good/bad/degraded sessions and alerts.

🎯 Requirements

Built AI evals, experiments, ranking, or search quality systems.
Strong data acumen; collaborate with data scientists and researchers.
Strong opinions on model and agent behaviors; stay current on trends.
Strong software engineering fundamentals and shipping production systems.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot