Site Reliability Engineer II

Added
17 days ago
Type
Full time
Salary
Salary not provided

Related skills

datadog observability cloudwatch ai tools incident.io

๐Ÿ“‹ Description

  • Design resilient systems with capacity planning in mind.
  • Define and measure SLOs/SLIs that reflect customer impact.
  • Use Datadog and CloudWatch for signal-heavy observability.
  • Configure alerting and routing via incident.io for on-call.
  • Improve incident lifecycle from detection to follow-up.
  • Build highly available, debuggable systems with strong developer experience.

๐ŸŽฏ Requirements

  • Bachelor's or Master's in Computer Science, Engineering, or related field.
  • 3+ years in SRE or Software Engineering.
  • Hands-on coding in any two programming languages.
  • Experience managing production environments.
  • Strong belief in observability as essential for reliability.
  • Experience using SLOs/SLIs/KPIs to guide decisions and tradeoffs.

๐ŸŽ Benefits

  • Healthcare
  • Internet/cell phone reimbursement
  • Learning and development stipend
  • Travel opportunities to Palo Alto HQ and Bangkok Site
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’