Senior Staff Production Engineer

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

azure aws grafana prometheus python

πŸ“‹ Description

  • Design and implement highly available, scalable infra across AWS, Azure, GCP, and bare-metal.
  • Drive automation by writing Python/Go to remove toil and build self-healing systems.
  • Improve observability with Prometheus, Grafana, OpenTelemetry; define SLIs/SLOs and error budgets.
  • Lead Incident Commander on-call; develop response playbooks and post-incident analyses.
  • Partner with Engineering for operability reviews and system maturity improvements.
  • Hybrid role with 3 days in San Jose, CA or remote.

🎯 Requirements

  • 8+ years of reliability, scalability for large-scale production services.
  • Deep programming expertise: Python, Go, or C/C++.
  • Strong networking, Linux/FreeBSD, and distributed architectures.
  • Experience in high-stakes incident management and 24/7 on-call rotation.
  • ITIL workflows and incident data to drive service maturity.
  • Extensive cloud experience with AWS, Azure, GCP and IaC using Ansible, Terraform.

🎁 Benefits

  • Various health plans
  • Time off plans for vacation and sick time
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks, and more!
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’