CoreWeave seeks an experienced HPC Network Architect to design, implement, and optimize high-performance network infrastructures that support GPU-accelerated workloads.
This remote-friendly role partners with Platform Engineering, Data Center Operations, and Software teams to deliver scalable, low-latency networks across on-prem and cloud environments.
Role overview
Design and implement robust network fabrics for HPC and GPU compute workloads. Evaluate and deploy interconnect technologies, optimize latency and throughput, and ensure security, reliability, and observability of the network fabric.
Responsibilities
- Define network topologies and choose interconnect technologies ( Ethernet, InfiniBand, RDMA ) to meet performance goals
- Design for low latency, high bandwidth, and scalable capacity for GPU-accelerated workloads
- Collaborate with Platform Engineering, Data Center Operations, and Software teams to deploy and maintain networks
- Implement monitoring, telemetry, and automation to improve network reliability and performance
- Evaluate new networking technologies and vendors; participate in capacity planning and disaster recovery planning
Requirements
- Strong background in HPC networking and data center fabrics
- Experience with Ethernet, InfiniBand, RDMA, QoS, and network virtualization
- Proficiency with Linux administration and scripting/automation (e.g., Python, Ansible)
- Excellent collaboration and communication skills for cross-functional teams
Nice to have
- Experience with GPU compute environments, cloud networking, or GPU cloud platforms
- Familiarity with monitoring/observability tools (Prometheus, Grafana, SNMP, etc.)