Related skills
distributed systems observability routing autoscaling gpu📋 Description
- Design and implement core systems and APIs powering Model Serving with scalability.
- Partner with product/engineering to define roadmap and long-term architecture for serving workloads.
- Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and efficiency for CPU and GPU serving.
- Contribute to key components across the serving infrastructure—from model container builds and deployment to runtime systems like routing, caching, observability, and autoscaling.
- Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable, performant systems.
- Lead initiatives that improve latency, availability, and cost-effectiveness across serving layers.
🎯 Requirements
- 10+ years of experience building and operating large-scale distributed systems.
- Deep expertise in model serving, inference systems, and related infrastructure (routing, scheduling, autoscaling, observability).
- Strong foundation in algorithms, data structures, and system design for large-scale, low-latency serving.
- Pervasive track record delivering technically complex, high-impact initiatives with measurable value.
- Experience leading architecture for large-scale CPU/GPU inference systems.
- Strong communication and collaboration skills in fast-moving environments.
🎁 Benefits
- Benefits vary by region; see https://www.mybenefitsnow.com/databricks.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!