Related skills
grpc golang python ray serve nvlinkπ Description
- Lead design and delivery of high-scale data plane components for AI models.
- Architect scalable, multi-tenant AI inference cloud systems.
- Optimize distributed inference with tensor/data parallelism and caching.
- Collaborate with product, customers, and other engineering teams.
- Mentor junior engineers and promote technical excellence.
- Operate critical services with observability and SLOs.
π― Requirements
- Distributed systems: microservices, messaging, databases, and IaC.
- Hosting LLMs with engines like vLLM, SGLang, Modular.
- Inference frameworks llm-d, NVIDIA Dynamo, Ray Serve.
- GPU optimization; NVlink, XGMI, RoCE interconnects.
- LLM architectures; continuous batching, quantization.
- GoLang or Python; gRPC proficiency.
- Cloud operations in high-scale environments; shipping customer-facing software.
- Open source mindset; experience integrating/open-source software.
π Benefits
- Career development resources and training reimbursements.
- Well-being programs and flexible time off.
- Equity grants and Employee Stock Purchase Program.
- LinkedIn Learning access for ongoing development.
- Weβre an equal-opportunity employer.
- Org highlights and global benefits vary by location.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!