Site Reliability Engineer (SRE) — Monstro US

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

bigquery terraform github actions kubernetes google cloud platform

📋 Description

  • Define and maintain SLOs and SLIs for tier-1 services
  • Build dashboards and alerts in Google Cloud Monitoring
  • Tune alert routing to make pages actionable
  • Instrument services for distributed tracing and structured logging
  • Own error budgets and prioritize reliability work over feature work
  • On-call: first responder for production alerts and drive mitigation

🎯 Requirements

  • Production experience on GCP (or AWS/Azure depth with willingness to ramp on GCP)
  • Comfortable on-call: run incidents and ship action items
  • Strong observability fundamentals: SLOs, log-based metrics, alert hygiene, dashboards
  • Working knowledge of Kubernetes, API gateways, identity systems, and at least one IaC tool
  • Scripting / coding fluency (Python, Go, Bash) for automation and tooling
  • Good written communication — handoffs, postmortems, and runbooks are part of the job

🎁 Benefits

  • Ownership and impact: Shape the future of AI-powered finance
  • Experienced team: Leadership with a track record of scaling companies
  • Principles-driven culture: Speed, ownership, and impact
  • Comprehensive benefits package: Health, vision, dental, and disability coverage
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →