Research Scientist, Safety Post Training

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

ai rlhf dpo interpretability robustness

πŸ“‹ Description

  • Design and run post-training pipelines to study safety, robustness, and alignment.
  • Develop interpretability-informed evaluations to reveal unsafe or undesirable behaviors.
  • Collaborate with policymakers, engineers, and researchers to translate findings into safety standards, benchmarks, and best practices.

🎯 Requirements

  • Commitment to safe, secure, and trustworthy AI deployments.
  • Experience with post-training and RL techniques such as RLHF, DPO, GRPO.
  • A track record of published ML research, particularly in generative AI.
  • At least three years addressing sophisticated ML problems in research or product development.
  • Strong written and verbal communication in cross-functional teams.
  • Nice to have: mechanistic interpretability, probing, or adversarial evaluation of post-trained models.

🎁 Benefits

  • Comprehensive health, dental, and vision coverage.
  • Retirement benefits.
  • Learning and development stipend.
  • Generous PTO.
  • Commuter stipend.
  • Equity-based compensation subject to board approval.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’