Related skills
datadog python distributed systems go observabilityπ Description
- Design systems with resilience, capacity, and graceful degradation.
- Define and measure SLOs/SLIs that reflect customer experience.
- Use Datadog and CloudWatch for signal-heavy, noise-light observability.
- Configure alerting and on-call routing via incident.io.
- Improve incident lifecycle: fast detection, triage, clear follow-ups.
- Build highly available, easy-to-debug systems with boring deployments.
π― Requirements
- A bachelor's or master's degree in computer science or equivalent.
- 4+ years of experience in an SRE or software engineering role.
- Hands-on coding in Python and/or Go.
- Distributed systems design, operation, and production support.
- Reliability mindset: SLOs, SLIs, error budgets, MTTR.
- Observability and incident response: diagnose from logs and metrics.
π Benefits
- Healthcare coverage.
- Internet and cell phone reimbursement.
- Learning and development stipend.
- Potential travel to Mountain View HQ.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!