Architect and maintain Kubernetes on AWS and on-premise.
Build and manage IaC with Terraform for reproducible environments.
Design and optimize AI/ML job scheduling with Slurm on Kubernetes.
Provision and manage on-prem bare metal servers for GPU computing.
Implement networking (CNI/service mesh) and storage (CSI/S3) for hybrid workloads.
Develop observability and automation for operations and incidents.

🎯 Requirements

5+ years in Platform Engineering, DevOps, or SRE
Hands-on Terraform experience in production
Expert knowledge of Kubernetes in large-scale env
Experience with Slurm HPC scheduler for GPU workloads
Experience provisioning bare metal servers (PXE MAAS) and lifecycle
Strong scripting skills (Python/Go/Bash)

🎁 Benefits

Medical, dental, vision benefits
Annual wellness stipend
Mental health support
Unlimited PTO
Generous paid parental leave
401(k) plan with company match

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot