Engineering Manager, Model Routing & Inference

Added
2 days ago
Type
Full time
Salary
Salary not provided

Related skills

gpu vllm tensorrt-llm tgi

📋 Description

  • Lead the Model Routing & Inference team, owning the inference platform powering AI interactions.
  • Own the full inference path: latency, reliability, and cost optimization at scale.
  • Set technical direction for cluster management, inference optimization, and traffic egress.
  • Build a platform enabling fast product work without provider complexity.
  • Lead engineers, set direction, and balance latency, cost, reliability, and UX.
  • Drive projects: inference gateway, model selection, GPU utilization, and routing control.

🎯 Requirements

  • Led teams building high-throughput, low-latency distributed systems (inference, routing).
  • Reason about cost/performance tradeoffs at scale (GPU, capacity planning) with incomplete info.
  • Strong software fundamentals; shipped production systems handling millions of requests.
  • Experience with model serving frameworks (vLLM, TensorRT-LLM, TGI) and load balancing.
  • Experience hiring and growing teams; coaching and mentorship.
  • Ability to make calls balancing reliability, cost, latency, and UX in ambiguous situations.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →