Added
1 hour ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
linux kubernetes gpu slurm infinibandπ Description
- Own the technical path from facility/rack design to a production-ready supercomputer.
- Lead GPU cluster bring-up, InfiniBand/RoCE fabric validation, and HPC benchmarking.
- Define models to manage customer bare-metal fleets at rack level and BMaaS.
- Partner with Data Center Ops, Fleet Ops, Networking, and Product Eng to shape the product roadmap.
- Shape product strategy with field insights and customer feedback.
- Stay current on Kubernetes, cloud infrastructure, and industry trends.
π― Requirements
- B.S. in Computer Science or related field, or equivalent experience.
- 7+ years in cloud infrastructure or field engineering focusing on bare-metal compute and large GPU clusters.
- Deep expertise with rack-scale GPU hardware (NVIDIA HGX) and InfiniBand/NVLink.
- Expert Linux administration and networking (TCP/IP, routing, fabric topologies).
- Hands-on GPU cluster bring-up: PXE boot, health checks, fabric validation; Kubernetes and Slurm integration.
- Proven ability to communicate with customers and translate complex concepts for technical and non-technical audiences.
π Benefits
- Medical, dental, and vision insurance (100% paid by CoreWeave).
- Company-paid life insurance.
- Disability insurance.
- Flexible Spending Account and Health Savings Account.
- 401(k) with employer match.
- Flexible PTO and parental leave.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!