Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
ai rlhf dpo interpretability robustnessπ Description
- Design and run post-training pipelines to study safety, robustness, and alignment.
- Develop interpretability-informed evaluations to reveal unsafe or undesirable behaviors.
- Collaborate with policymakers, engineers, and researchers to translate findings into safety standards, benchmarks, and best practices.
π― Requirements
- Commitment to safe, secure, and trustworthy AI deployments.
- Experience with post-training and RL techniques such as RLHF, DPO, GRPO.
- A track record of published ML research, particularly in generative AI.
- At least three years addressing sophisticated ML problems in research or product development.
- Strong written and verbal communication in cross-functional teams.
- Nice to have: mechanistic interpretability, probing, or adversarial evaluation of post-trained models.
π Benefits
- Comprehensive health, dental, and vision coverage.
- Retirement benefits.
- Learning and development stipend.
- Generous PTO.
- Commuter stipend.
- Equity-based compensation subject to board approval.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!