Related skills
training llms rlhf synthetic data post-training📋 Description
- Develop a rigorous understanding of what makes an agent a great collaborator.
- Turn judgments about model behavior into hypotheses, evals, graders, training.
- Study user signals to understand trust, satisfaction, and outcomes.
- Work with experts to produce high-quality rollout data and evaluations.
- Improve reward models and RL objectives for model behaviors.
- Partner with ChatGPT, Codex, and other teams to validate improvements in real workflows.
🎯 Requirements
- Think from the user’s perspective and care about how models feel.
- Translate subjective product questions into falsifiable hypotheses and evaluations.
- Preserve individuality, adaptability, and behavioral diversity.
- Shape how frontier agents communicate and build trust.
- Strong foundations in ML, software, stats, HCI; quick to learn stack.
- Experience with LLMs, post-training, RL/RLHF, reward modeling.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!