Related skills
grpc python go vllm sglangπ Description
- Technical leadership: drive end-to-end design of data plane components for AI models.
- System design: architect high-scale, multi-tenant AI inference cloud with resiliency.
- Performance optimization: use tensor/data parallelism, KV cache, and smart routing.
- Collaboration: work with PMs, customer teams, and engineers to align roadmaps.
- Mentorship: coach junior engineers and foster technical excellence.
- Operational excellence: maintain high-scale services with observability and SLOs.
π― Requirements
- Distributed Systems Expertise: microservices, messaging systems, databases, and IaC.
- AI/ML domain knowledge: hosting LLMs and multimodal models with engines like vLLM, SGLang.
- Inference frameworks: llm-d, NVIDIA Dynamo, Ray Serve.
- Hardware & Interconnects: GPU optimization; NVlink, XGMI, RoCE.
- Architecture proficiency: LLM architectures and optimization (continuous batching, quantization).
- Software engineering: Go or Python; gRPC.
π Benefits
- Career development resources and LinkedIn Learning.
- Well-being: EAP, local meetups, flexible time off.
- Equity grants and Employee Stock Purchase Program.
- Bonus potential based on performance.
- Competitive benefits and equal opportunity employer.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!