Senior Engineer 2: Inference Data Plane

Added
26 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

grpc golang python ray serve nvlink

πŸ“‹ Description

  • Lead design and delivery of high-scale data plane components for AI models.
  • Architect scalable, multi-tenant AI inference cloud systems.
  • Optimize distributed inference with tensor/data parallelism and caching.
  • Collaborate with product, customers, and other engineering teams.
  • Mentor junior engineers and promote technical excellence.
  • Operate critical services with observability and SLOs.

🎯 Requirements

  • Distributed systems: microservices, messaging, databases, and IaC.
  • Hosting LLMs with engines like vLLM, SGLang, Modular.
  • Inference frameworks llm-d, NVIDIA Dynamo, Ray Serve.
  • GPU optimization; NVlink, XGMI, RoCE interconnects.
  • LLM architectures; continuous batching, quantization.
  • GoLang or Python; gRPC proficiency.
  • Cloud operations in high-scale environments; shipping customer-facing software.
  • Open source mindset; experience integrating/open-source software.

🎁 Benefits

  • Career development resources and training reimbursements.
  • Well-being programs and flexible time off.
  • Equity grants and Employee Stock Purchase Program.
  • LinkedIn Learning access for ongoing development.
  • We’re an equal-opportunity employer.
  • Org highlights and global benefits vary by location.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’