Researcher, Computer Use - Agent Post-Training

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

llms rlhf rl post-training evals

📋 Description

  • Design and run experiments to improve agentic model behavior for computer use (desktop/browser).
  • Own end-to-end post-training improvements: RL, data pipelines, graders, rewards, evals, diagnostics.
  • Build evals/environments to expose model failures and convert them into training data or fixes.
  • Partner with Codex/ChatGPT teams to translate user needs into model improvements.
  • Work on early training and alignment interventions (data mixtures, objectives, synthetic data, eval loops).
  • Decide which integrations and fixes are ready for major model runs.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →