Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
llms rlhf rl post-training evals📋 Description
- Design and run experiments to improve agentic model behavior for computer use (desktop/browser).
- Own end-to-end post-training improvements: RL, data pipelines, graders, rewards, evals, diagnostics.
- Build evals/environments to expose model failures and convert them into training data or fixes.
- Partner with Codex/ChatGPT teams to translate user needs into model improvements.
- Work on early training and alignment interventions (data mixtures, objectives, synthetic data, eval loops).
- Decide which integrations and fixes are ready for major model runs.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!