Develop a rigorous understanding of what makes an agent a great collaborator.
Turn judgments about model behavior into hypotheses, evals, graders, training.
Study user signals to understand trust, satisfaction, and outcomes.
Work with experts to produce high-quality rollout data and evaluations.
Improve reward models and RL objectives for model behaviors.
Partner with ChatGPT, Codex, and other teams to validate improvements in real workflows.

🎯 Requirements

Think from the user’s perspective and care about how models feel.
Translate subjective product questions into falsifiable hypotheses and evaluations.
Preserve individuality, adaptability, and behavioral diversity.
Shape how frontier agents communicate and build trust.
Strong foundations in ML, software, stats, HCI; quick to learn stack.
Experience with LLMs, post-training, RL/RLHF, reward modeling.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot