Automated infrastructure: design, build, and maintain scalable observability with Terraform.
GCP observability: optimize collect/process/store; ensure Splunk/Grafana low latency.
Incident Response: participate in on-call rotations and lead post-incident reviews.
Automation: eliminate toil by deploying/scaling observability agents and collectors.

🎯 Requirements

GKE: 5+ years scaling observability on Google Cloud
Visualization: create Splunk or Grafana dashboards across sources
SRE Mindset: 3+ years in SRE/DevOps or systems engineering for HA
Programming: Python and Go for tooling and automation
Distributed Systems: Linux internals, networking, Kubernetes/GKE
Telemetry/Bonus: OpenTelemetry/Vector; Grafana Loki; AWS tools

🎁 Benefits

Benefits: health, dental, vision, 401(k), FSA, and paid leave
Social Impact: Okta for Good
Onboarding: some roles may require travel for in-person onboarding

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot