Senior Site Reliability Engineer

Added
1 day ago
Type
Full time
Salary
Salary not provided

Related skills

terraform aws grafana prometheus kubernetes

📋 Description

  • Own reliability, availability, and performance of the systems behind k-ID’s platform and public APIs
  • Design and improve scalable AWS and Kubernetes infrastructure for global workloads
  • Build observability across logs, metrics, tracing, alerting, and service health
  • Improve deployment safety with CI/CD workflows, release controls, and rollback paths
  • Drive incident response and production readiness with runbooks, on call hygiene, postmortems
  • Reduce operational toil by automating repetitive tasks and tooling

🎯 Requirements

  • 5+ years of experience in infrastructure, platform engineering, SRE, or software with production ownership
  • Strong experience running production systems in AWS
  • Strong hands-on experience with Kubernetes and container-based workloads
  • Experience with infrastructure as code, preferably Terraform
  • Experience designing and operating observability stacks (Prometheus, Alertmanager, Grafana, OpenTelemetry)
  • Ability to write code and automation in Go, Python, or TypeScript
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →