Related skills
golang python distributed systems llm ray serveπ Description
- Technical Leadership: Lead design and delivery of data plane components for large generative AI models.
- System Design: Architect high-scale, multi-tenant AI inference cloud components.
- Performance Optimization: Optimize distributed inference with tensor/data parallelism and caching.
- Collaboration: Align roadmaps with PMs, customer teams, and engineers.
- Mentorship: Mentor junior engineers and foster technical excellence.
- Operational Excellence: Maintain high-scale services with observability and SLOs.
π― Requirements
- Distributed Systems Expertise: Distributed systems with microservices, messaging, databases, and infra as code.
- AI/ML Domain Knowledge: Hosting large language or multimodal models with inference engines (vLLM, SGLang, Modular).
- Inference Frameworks: Distributed inference serving frameworks (llm-d, NVIDIA Dynamo, Ray Serve).
- Hardware & Interconnects: GPU optimization and interconnects (NVlink, XGMI, RoCE).
- Architecture Proficiency: LLM architectures and optimizations (batching, quantization).
- Software Engineering: GoLang or Python expert; experience with gRPC.
- Cloud Operations: Shaping customer-facing software in high-scale environments.
- Open Source Mindset: Building with open-source software.
π Benefits
- Innovate with purpose; build cloud/AI for builders.
- Career development resources, conferences, training, LinkedIn Learning.
- Well-being programs and global benefits vary by region.
- Competitive compensation with equity and ESPP options.
- Equal-opportunity employer and inclusive culture.
- Employee assistance, local meetups, and growth opportunities.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!