Senior Site Reliability Engineer - Observability

Added
11 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform aws python kubernetes go

๐Ÿ“‹ Description

  • Own the Splunk-based observability platform and ecosystem.
  • Automate deployment of agents and collectors across distributed systems with Terraform.
  • Build and maintain scalable observability infrastructure.
  • Participate in on-call rotations and post-incident reviews.
  • Drive observability-driven development to reduce toil and improve reliability.

๐ŸŽฏ Requirements

  • 5+ years Splunk Cloud at scale (1000+ services) with WLM/HEC.
  • Expertise creating Splunk dashboards correlating data across sources.
  • 3+ years in SRE/DevOps or systems engineering for high-availability.
  • Programming: strong SPL and Go for tools and automation.
  • Distributed systems: Linux internals, networking, Kubernetes/EKS.
  • Problem solving: data-driven debugging of cross-service bottlenecks.

๐ŸŽ Benefits

  • Benefits package including health, dental, vision.
  • Social impact and volunteering opportunities.
  • Talent and community at Okta.
  • In-person onboarding at SF office for some roles.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’