Related skills
datadog observability cloudwatch ai tools incident.io๐ Description
- Design resilient systems with capacity planning in mind.
- Define and measure SLOs/SLIs that reflect customer impact.
- Use Datadog and CloudWatch for signal-heavy observability.
- Configure alerting and routing via incident.io for on-call.
- Improve incident lifecycle from detection to follow-up.
- Build highly available, debuggable systems with strong developer experience.
๐ฏ Requirements
- Bachelor's or Master's in Computer Science, Engineering, or related field.
- 3+ years in SRE or Software Engineering.
- Hands-on coding in any two programming languages.
- Experience managing production environments.
- Strong belief in observability as essential for reliability.
- Experience using SLOs/SLIs/KPIs to guide decisions and tradeoffs.
๐ Benefits
- Healthcare
- Internet/cell phone reimbursement
- Learning and development stipend
- Travel opportunities to Palo Alto HQ and Bangkok Site
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!