Staff Backend Software Engineer- (AI Platform)

Added
1 hour ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

distributed systems observability routing autoscaling gpu

📋 Description

  • Design and implement core systems and APIs powering Model Serving with scalability.
  • Partner with product/engineering to define roadmap and long-term architecture for serving workloads.
  • Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and efficiency for CPU and GPU serving.
  • Contribute to key components across the serving infrastructure—from model container builds and deployment to runtime systems like routing, caching, observability, and autoscaling.
  • Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable, performant systems.
  • Lead initiatives that improve latency, availability, and cost-effectiveness across serving layers.

🎯 Requirements

  • 10+ years of experience building and operating large-scale distributed systems.
  • Deep expertise in model serving, inference systems, and related infrastructure (routing, scheduling, autoscaling, observability).
  • Strong foundation in algorithms, data structures, and system design for large-scale, low-latency serving.
  • Pervasive track record delivering technically complex, high-impact initiatives with measurable value.
  • Experience leading architecture for large-scale CPU/GPU inference systems.
  • Strong communication and collaboration skills in fast-moving environments.

🎁 Benefits

  • Benefits vary by region; see https://www.mybenefitsnow.com/databricks.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →