Senior DevOps Engineer (AI & Cloud Infrastructure)

Added
18 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

azure terraform aws grafana prometheus

πŸ“‹ Description

  • Architect, deploy, and operate large-scale LLM inference servers with low latency.
  • Design cloud architectures across Azure and AWS.
  • Manage GPU-enabled Kubernetes clusters and Slurm HPC for AI workloads.
  • Deploy Kubernetes components and operators (GPU, ingress, CNIs, CSIs).
  • Build IaC and deployment workflows with Terraform, Helm, Kustomize, ArgoCD.
  • Design and maintain centralized observability with Prometheus, Grafana, Clickhouse.

🎯 Requirements

  • 5+ years in DevOps, SRE, or ML infra for production systems.
  • Azure and AWS: storage, compute, networking, and databases.
  • Kubernetes admin with GPU scheduling; Slurm desirable.
  • Deploy/scale/operate LLMs and inference engines (vLLM, TGI, Triton).
  • DevOps tooling: Terraform, Helm, Kustomize, ArgoCD; CI (GitHub/GitLab).
  • Python and Bash scripting; debugging distributed systems at scale.

🎁 Benefits

  • Diverse medical, dental and vision options
  • 401k matching program
  • Unlimited paid time off
  • Parental leave and flexibility for all parents and caregivers
  • Support of country-specific visa needs for international employees living in the Bay Area
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’