Related skills
datadog docker aws python kubernetes๐ Description
- Design systems with resilience and capacity in mind.
- Define and measure SLOs/SLIs reflecting customer impact.
- Use Datadog with CloudWatch for signal-heavy observability.
- Configure alerting and routing via incident.io for on-call.
- Improve incident lifecycle from detection to postmortems.
- Build reliable, debuggable systems; minimize 2 a.m. alerts.
๐ฏ Requirements
- Software background; curiosity about large-scale production systems.
- Proficient in Python or Go; automation experience a plus.
- Some exposure to AWS, Docker, Kubernetes.
- Familiar with metrics, logs, and traces; monitoring concepts.
- Awareness of SLOs/SLIs; reliability matters to end users.
- BS/BE in CS/Engineering + 1+ year SRE/DevOps/Eng; internships count; AI tooling familiar.
๐ Benefits
- Healthcare
- Internet / cell phone reimbursement
- Learning and development stipend
- Opportunities to travel to Palo Alto HQ and Bangkok Site
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!