Added
17 days ago
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
kubernetes rlhf hpc rl slurmπ Description
- Lead product strategy for Research Training Stack and orchestration tools.
- Evolve SUNK (Slurm on Kubernetes) for deterministic bare-metal HPC.
- Drive training services and eval frameworks for model quality.
- Build RL/RLHF pipelines enabling efficient model refinement.
- Partner with global AI labs to translate research needs into roadmaps.
π― Requirements
- 15+ years engineering leadership experience.
- 5+ years managing large-scale infra at AI labs or cloud providers.
- Domain expertise: Slurm, Kubernetes, InfiniBand/RDMA for training.
- Research mindset: frontier model pre/post-training experience.
- Scaling: multi-thousand GPU clusters (H100/Blackwell/Rubin).
- Strategic vision for next-gen AI stack (RL loops, sandbox envs).
π Benefits
- Hybrid work; remote options for eligible candidates.
- Onboarding at a hub within the first month; quarterly team gatherings.
- Comprehensive benefits: medical, dental, vision, life.
- 401(k) with match; equity awards; ESPP.
- Tuition reimbursement; parental leave; flexible PTO.
- Casual, innovative culture focused on disruption.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Product Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!