Architect and maintain Kubernetes-based platform on AWS and on-prem.
Develop and manage infrastructure with Terraform (IaC).
Design and optimize AI/ML job scheduling using Slurm on Kubernetes.
Provision, manage, and maintain on-prem bare metal GPUs infrastructure.
Implement networking (CNI, service mesh) and storage (CSI, S3) for hybrid workloads.
Build observability stack and automate operational tasks and incident response.

🎯 Requirements

5+ years in Platform Engineering, DevOps, or SRE.
Proven Terraform experience in production infra.
Expert Kubernetes architecture and operations in large-scale env.
Experience with HPC schedulers, especially Slurm, for GPU AI workloads.
Experience managing bare metal infrastructure (PXE, MAAS).
Strong scripting and automation (Python, Go, Bash).

🎁 Benefits

Medical, dental, vision benefits
Annual wellness stipend
Mental health support
Life, STD, LTD income insurance
Unlimited PTO
401(k) plan with company match

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot