Staff Software Engineer, Inference

Added
20 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

python kubernetes go cuda triton

πŸ“‹ Description

  • Lead architecture, performance, and reliability across services.
  • Drive cross-team design initiatives for inference workloads.
  • Optimize latency, throughput, and GPU utilization in production.
  • Tackle scheduling, batching, and memory optimization in Kubernetes infra.
  • Provide hands-on technical leadership shaping engineering direction.
  • Work on distributed systems and Kubernetes-based infrastructure.

🎯 Requirements

  • 8–12+ years building large-scale distributed systems or cloud platforms.
  • Proven cross-team leadership across multiple services.
  • Strong Go, Python, or C++ programming skills.
  • Kubernetes production-scale expertise.
  • Experience with inference frameworks: vLLM, Triton, TorchServe.
  • GPU systems experience: CUDA, NCCL, RDMA, NUMA.

🎁 Benefits

  • Medical, dental, and vision insurance – 100% paid by CoreWeave.
  • 401(k) with generous employer match.
  • Flexible PTO.
  • Tuition Reimbursement.
  • Employee Stock Purchase Program (ESPP).
  • Parental leave and family-forming support.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’