Added
2 hours ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
tensorflow pytorch cuda inference triton๐ Description
- Lead next-gen model inference and feature serving for 100x larger models.
- Design low-latency, high-throughput inference pipelines to meet SLOs.
- Collaborate to productionize new model architectures (LLMs, ranking) and scale globally.
- Evolve online feature platform for coverage, freshness, consistency.
- Evaluate GPU acceleration, model compression, Triton, vLLM, Dynamo.
- Partner with infra/ML teams to boost reliability and velocity.
๐ฏ Requirements
- BS degree in Computer Science or related field.
- ~8+ years designing/operating large-scale ML or distributed infra.
- Deep knowledge of Java, C++, Python.
- Distributed systems or ads infra (routing, storage, caching).
- Hands-on with PyTorch or TensorFlow.
- Proven track record leading complex projects and mentoring.
๐ Benefits
- Hybrid work model; in-person 1-2 days per week near Palo Alto/SF/Seattle.
- PinFlex flexible working options and information page.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!