Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

kubernetes pytorch triton mpi kueue

πŸ“‹ Description

  • Architect large-scale scheduling for Kubernetes clusters (1k+ nodes, 10k+ pods)
  • Maximize GPU utilization with fractional allocation and fairshare scheduling
  • Optimize placement and topology for multi-GPU workloads
  • Enhance cluster performance via etcd tuning and in-place pod resizing
  • Secure AI workloads with multi-layer isolation and sandboxes
  • Enable distributed training with gang scheduling (Volcano, Kueue, LWS)

🎯 Requirements

  • Kubernetes core expertise and CRDs for advanced scheduling
  • Experience with AI schedulers: Kueue, Volcano, YuniKorn, Run:ai
  • GPU hardware/topology knowledge and interconnects
  • Resource management: DRF, load-aware scheduling, bin-packing
  • Container runtimes, rootless containers, and security contexts
  • AI/ML frameworks: LLM serving, prefill-decode disaggregation, Triton

🎁 Benefits

  • Career development resources and training (LinkedIn Learning access, reimbursement)
  • Well-being programs: EAP, local meetups, flexible time off
  • Equity compensation and Employee Stock Purchase Program
  • Conference and education reimbursement
  • Inclusive, equal-opportunity employer
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’