Software Engineer — GPU Networking & Distributed Systems

Added
21 hours ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

infiniband rdma nccl tensorrt-llm roce v2

📋 Description

  • Make RDMA First-Class: integrate RDMA/RoCE/InfiniBand into inference stack.
  • Optimize Distributed Inference: tune networking for KV Cache Offload and WideEP.
  • Enable serverless-grade startup for trillion-parameter models.
  • Deep-dive into hardware: validate networking on H100/H200/NVL72 clusters.
  • Build Observability: visualize packet flow, congestion, and bandwidth.
  • Optimize Kernels: work with NCCL/NVSHMEM and custom kernels.

🎯 Requirements

  • Deep experience with high-performance networking protocols (InfiniBand, RoCE v2).
  • Fluent in C++ or Python; memory hierarchy for H100/Blackwell.
  • Bridge software and hardware; debug NVLink topology.
  • Know when to use off-the-shelf vs custom solutions for performance.
  • Knowledge of NCCL, NVSHMEM, UCX for GPU interconnects.
  • Familiar with TensorRT-LLM, vLLM, or Sglang.
  • Experience running low-level benchmarks to qualify new hardware clusters.

🎁 Benefits

  • Competitive compensation with equity.
  • Medical, dental, and vision coverage for you and dependents.
  • Generous PTO including Winter Break.
  • Paid parental leave.
  • Company 401(k) program.
  • Exposure to ML startups for learning and networking.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →