Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Design and implement core systems and APIs powering Model Serving with scalability.
Partner with product/engineering to define roadmap and long-term architecture for serving workloads.
Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and efficiency for CPU and GPU serving.
Contribute to key components across the serving infrastructure—from model container builds and deployment to runtime systems like routing, caching, observability, and autoscaling.
Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable, performant systems.
Lead initiatives that improve latency, availability, and cost-effectiveness across serving layers.

10+ years of experience building and operating large-scale distributed systems.
Deep expertise in model serving, inference systems, and related infrastructure (routing, scheduling, autoscaling, observability).
Strong foundation in algorithms, data structures, and system design for large-scale, low-latency serving.
Pervasive track record delivering technically complex, high-impact initiatives with measurable value.
Experience leading architecture for large-scale CPU/GPU inference systems.
Strong communication and collaboration skills in fast-moving environments.

Staff Backend Software Engineer- (AI Platform)

Meet JobCopilot: Your Personal AI Job Hunter