Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
kubernetes pytorch triton mpi kueueπ Description
- Architect large-scale scheduling for Kubernetes clusters (1k+ nodes, 10k+ pods)
- Maximize GPU utilization with fractional allocation and fairshare scheduling
- Optimize placement and topology for multi-GPU workloads
- Enhance cluster performance via etcd tuning and in-place pod resizing
- Secure AI workloads with multi-layer isolation and sandboxes
- Enable distributed training with gang scheduling (Volcano, Kueue, LWS)
π― Requirements
- Kubernetes core expertise and CRDs for advanced scheduling
- Experience with AI schedulers: Kueue, Volcano, YuniKorn, Run:ai
- GPU hardware/topology knowledge and interconnects
- Resource management: DRF, load-aware scheduling, bin-packing
- Container runtimes, rootless containers, and security contexts
- AI/ML frameworks: LLM serving, prefill-decode disaggregation, Triton
π Benefits
- Career development resources and training (LinkedIn Learning access, reimbursement)
- Well-being programs: EAP, local meetups, flexible time off
- Equity compensation and Employee Stock Purchase Program
- Conference and education reimbursement
- Inclusive, equal-opportunity employer
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!