Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
datadog terraform aws python kubernetesπ Description
- Set AI-centered reliability strategy; define SLIs, SLOs, and error budgets across services.
- Redesign the incident lifecycle around AI-assisted speed; lead high-severity incident response.
- Improve on-call through automation; build AI agents that draft runbooks.
- Push AI-first operations into product engineering workflows.
- Architect for resilience at scale across EarnIn's AWS footprint (EKS, Kafka, DynamoDB, RDS, SQS).
- Mentor engineers; raise reliability standards; build docs and tooling.
π― Requirements
- 7+ years in SRE or related roles with KPI-driven reliability.
- Experience applying AI/LLMs to production workflows (alert triage, runbooks, investigations).
- Deep SLOs/SLIs, error budgets, and blameless postmortems at scale.
- Software engineering ability (Python, Go) building tools and automation.
- Observability expertise (Datadog, CloudWatch, OpenTelemetry).
- IaC proficiency (Terraform, Kubernetes, AWS) with safe deployments.
- Proficiency with AI-assisted development tools (Cursor, Claude Code, Copilot).
- Fintech or regulated environment experience (SOC 2, PCI) a plus.
π Benefits
- Equity and benefits.
- Hybrid work; in-office 2 days/week in Mountain View.
- Professional growth opportunities.
- HQ Mountain View, CA.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!