Own and improve reliability, availability, and performance on GCP.
Participate in incident management: detection, triage, mitigation, recovery.
Improve incident workflows and tooling (ServiceNow) for ownership and rapid communication.
Design and operate observability: metrics, logs, traces, dashboards (Splunk/OpenTelemetry).
Reduce toil via automation and SRE best practices.
Support on-call rotations across time zones for 24/7 coverage.
Define, monitor, and report SLIs, SLOs, and error budgets.
Drive high availability through SRE and proactive reliability engineering.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot