Staff Software Engineer - Managed Kubernetes

Added
4 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

python kubernetes go slurm nccl

πŸ“‹ Description

  • Drive technical vision for Lambda's managed Kubernetes bare-metal platform
  • Integrate NVIDIA GPU Operator, DCGM, NCCL, and topology tools
  • Design GPU-aware orchestration and multi-tenant Kubernetes
  • Lead development of services powering our managed platform
  • Build foundation for Managed Slurm on Kubernetes
  • Design platform services for inference, autoscaling, multi-model deployment

🎯 Requirements

  • 10+ years in software/platform engineering or SRE, with 5+ years on Kubernetes at scale
  • Expert-level Kubernetes internals: API machinery, controllers, schedulers, CRDs, CSI, CNI
  • Go and Python production-quality code
  • GPU orchestration in Kubernetes: NVIDIA GPU Operator, DCGM, MIG
  • Leadership: drive design decisions and mentor engineers
  • Observability at scale: Prometheus, Grafana, tracing, alerting

🎁 Benefits

  • Build core platform services for AI workloads
  • NVIDIA partnership with cutting-edge tooling
  • Tackle massive-scale GPU clusters
  • Cross-stack influence across network, storage, compute
  • Competitive compensation and equity
  • Health, dental, vision; wellness stipend; 401(k) match; PTO
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’