Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Develop service level objectives for large language model serving systems, balancing availability and latency with development velocity.
Design and implement monitoring and observability systems across the token path.
Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers.
Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements.
Support the reliability of safeguard model serving as part of Anthropic's safety commitments.

Strong distributed systems, infrastructure, or reliability background (SRE/engineer).
Comfortable jumping into unfamiliar systems during incidents and driving resolution.
Think holistically about how systems compose and where the seams are.
Excellent communication and collaboration to partner across the company.
Diverse experience across product stacks, scaling databases, and large distributed systems.
Experience with AI model serving and observability tools is a plus.

Software Engineer, AI Reliability

Meet JobCopilot: Your Personal AI Job Hunter