Overview
2K is seeking a Senior Site Reliability Engineer to join our infrastructure team in Novato, California. You will design, build, and operate scalable, highly available systems that power our services, automate deployments, and drive reliability improvements across our platform.
Responsibilities
- Design, implement, and maintain reliable, scalable systems in cloud environments (e.g., AWS).
- Lead incident response, perform post-incident reviews, and drive root-cause analyses and improvements.
- Instrument systems with monitoring, logging, and tracing to improve observability (Prometheus, Grafana, and related tooling).
- Collaborate with software engineering and platform teams to enhance reliability, performance, and deployment processes.
- Develop automation for deployments, provisioning, and configuration management (e.g., Terraform, CI/CD pipelines).
Qualifications
- 5+ years of experience in Site Reliability Engineering, DevOps, or a related field.
- Strong expertise with Linux, cloud platforms (AWS preferred), container orchestration (Kubernetes), and observability tooling.
- Programming/scripting skills in Go, Python, or Bash.
- Experience with incident management, disaster recovery, and capacity planning.
Benefits
Competitive compensation and benefits package. This role is based in Novato, CA, with a focus on on-site collaboration.