Post-Training Applied Researcher

Added
22 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

machine learning llm ppo dpo sft

πŸ“‹ Description

  • Design and run post-training pipelines (SFT, GRPO, DPO, RLVR)
  • Build task-specific training environments and evals for healthcare, code, and legal
  • Translate production data into training signals; design reward loops
  • Run end-to-end training experiments; diagnose reward hacking and drift
  • Publish findings and contribute to Baseten's open-source training libraries

🎯 Requirements

  • Hands-on LLM training with reinforcement learning (GRPO/PPO)
  • Strong reward engineering intuition; distinguish effective vs exploitable rewards
  • Experience building multi-turn agent environments with tool use
  • Comfort with end-to-end ML pipeline from data to deployment
  • Experience with production ML systems; prefer closed-loop production data
  • Experience with RL training frameworks
  • Publications at NeurIPS/ICML/ICLR on RL for LLMs, reward modeling, or alignment

🎁 Benefits

  • Competitive pay with meaningful equity
  • 100% medical, dental, and vision for you and dependents
  • Generous PTO including Winter Break
  • Paid parental leave
  • Company-facilitated 401(k)
  • Exposure to ML startups for learning/networking
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’