Lead Site Reliability Engineer

Added
1 day ago
Type
Full time
Salary
Salary not provided

Related skills

terraform aws python kubernetes go

πŸ“‹ Description

  • Own reliability and operational health of production systems
  • Lead the NOC: shift structure, escalation paths, incident standards, readiness, reporting
  • Act as senior escalation point and incident commander for high severity events
  • Design and improve monitoring, alerting, and tooling for early detection
  • Drive root cause analysis and post-incident reviews to produce real action
  • Build and maintain runbooks, readiness checklists, and service health standards

🎯 Requirements

  • 7+ years in SRE/infrastructure with production ownership
  • Strong AWS production systems experience
  • Kubernetes and containerized services experience
  • Observability across metrics, logs, tracing, alerting
  • Incident response programs, on-call ops, post-incident reviews
  • Infrastructure as code with Terraform

🎁 Benefits

  • Employee Stock Ownership Plan for long-term upside
  • Comprehensive health coverage (medical, dental, vision)
  • Mental health and wellness support
  • Hands-on exposure with key clients in a scaling global tech company
  • Continuous learning through real ownership
  • Direct collaboration with the Founders and tech leadership
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’