Staff Site Reliability Engineer

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog terraform aws python kubernetes

πŸ“‹ Description

  • Set AI-centered reliability strategy; define SLIs, SLOs, and error budgets across services.
  • Redesign the incident lifecycle around AI-assisted speed; lead high-severity incident response.
  • Improve on-call through automation; build AI agents that draft runbooks.
  • Push AI-first operations into product engineering workflows.
  • Architect for resilience at scale across EarnIn's AWS footprint (EKS, Kafka, DynamoDB, RDS, SQS).
  • Mentor engineers; raise reliability standards; build docs and tooling.

🎯 Requirements

  • 7+ years in SRE or related roles with KPI-driven reliability.
  • Experience applying AI/LLMs to production workflows (alert triage, runbooks, investigations).
  • Deep SLOs/SLIs, error budgets, and blameless postmortems at scale.
  • Software engineering ability (Python, Go) building tools and automation.
  • Observability expertise (Datadog, CloudWatch, OpenTelemetry).
  • IaC proficiency (Terraform, Kubernetes, AWS) with safe deployments.
  • Proficiency with AI-assisted development tools (Cursor, Claude Code, Copilot).
  • Fintech or regulated environment experience (SOC 2, PCI) a plus.

🎁 Benefits

  • Equity and benefits.
  • Hybrid work; in-office 2 days/week in Mountain View.
  • Professional growth opportunities.
  • HQ Mountain View, CA.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’