Staff Site Reliability Engineer - Observability

Added
2 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform grafana python kubernetes go

๐Ÿ“‹ Description

  • Automated infrastructure: design, build, and maintain scalable observability with Terraform.
  • GCP observability: optimize collect/process/store; ensure Splunk/Grafana low latency.
  • Incident Response: participate in on-call rotations and lead post-incident reviews.
  • Automation: eliminate toil by deploying/scaling observability agents and collectors.

๐ŸŽฏ Requirements

  • GKE: 5+ years scaling observability on Google Cloud
  • Visualization: create Splunk or Grafana dashboards across sources
  • SRE Mindset: 3+ years in SRE/DevOps or systems engineering for HA
  • Programming: Python and Go for tooling and automation
  • Distributed Systems: Linux internals, networking, Kubernetes/GKE
  • Telemetry/Bonus: OpenTelemetry/Vector; Grafana Loki; AWS tools

๐ŸŽ Benefits

  • Benefits: health, dental, vision, 401(k), FSA, and paid leave
  • Social Impact: Okta for Good
  • Onboarding: some roles may require travel for in-person onboarding
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’