Related skills
networking kubernetes distributed systems observability firmware📋 Description
- Spin up and scale large Kubernetes clusters via automation
- Build abstractions to unify clusters for training workloads
- Own bare-metal bring-up and firmware upgrades at massive scale
- Improve metrics: reduce cluster restart times and upgrade cycles
- Integrate networking and hardware health for end-to-end reliability
- Develop monitoring and observability to detect issues under extreme load
🎯 Requirements
- Infrastructure/sys or distributed systems engineer in large-scale or high-availability environments
- Kubernetes internals, cluster scaling, and containerized workloads
- Compute infrastructure concepts and automation of cluster/data-center ops
- Bonus: GPU workloads, firmware management, HPC
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!