Related skills
distributed systems apis scalability gpu vllmπ Description
- Design and build high-throughput, low-latency GPU inference systems.
- Shape architecture for foundation model API across teams.
- Design core systems and APIs powering Foundation Model Serving with scalability and reliability.
- Drive architectural trade-offs to optimize performance and autoscaling for GPU serving.
- Contribute to key components across serving infra including vLLM and SGLang.
- Collaborate across product/platform/research; mentor engineers.
π― Requirements
- 10+ years building and operating large-scale distributed systems.
- Experience leading high-scale, operationally sensitive backend systems.
- Track record of elevating teams' engineering excellence.
- Strong foundation in algorithms, DS, and system design for low-latency serving.
- Proven ability to deliver technically complex, high-impact initiatives.
- Strong communication and cross-team collaboration in fast-moving environments.
π Benefits
- Comprehensive, region-specific benefits.
- Career growth and mentorship opportunities.
- Inclusive, diverse engineering culture.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!