Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Overview

Alchemy is seeking a Senior Site Reliability Engineer to join our platform reliability team. You will design, build, and operate scalable, highly available services powering Alchemy's products. You will own incident response, monitoring, and reliability improvements across distributed cloud-based services.

Responsibilities

Design, implement, and maintain scalable infrastructure and CI/CD pipelines.
Own on-call rotations and lead incident response and post-incident reviews.
Build and maintain observability stack using Prometheus, Grafana, and related tooling.
Drive reliability improvements with infrastructure as code (Terraform, Kubernetes).
Collaborate with software engineers to define SLOs/SLIs and perform capacity planning.
Document runbooks and maintain robust incident playbooks.

Requirements

5+ years of Site Reliability Engineering, DevOps, or equivalent experience.
Strong Linux system administration and networking fundamentals.
Hands-on experience with cloud providers (AWS, GCP, Azure).
Proficiency with Kubernetes and container orchestration.
Experience with Prometheus, Grafana, and monitoring/observability tooling.
Infrastructure as Code experience (Terraform, CloudFormation).
Proficiency in at least one language (Go, Python, or similar).
Excellent communication and collaboration skills.

Nice-to-have

Experience with Chaos Engineering, SRE practices, and incident management tooling.
Security-conscious design and compliance considerations.

Benefits

Competitive compensation and stock options, comprehensive health insurance, retirement plan, flexible work arrangements, generous PTO, and opportunities for professional growth at Alchemy.

Alchemy