Related skills
datadog mongodb redis grafana prometheusπ Description
- Monitor platform health and performance across systems
- Respond to alerts and incidents using runbooks
- Escalate issues to SRE/Platform Engineers as needed
- Maintain uptime and reliability with operational processes
- Update runbooks and participate in post-incident reviews
- Communicate status clearly during incidents
π― Requirements
- 1-3 years in technical operations or entry-level cloud roles
- Familiarity with AWS, GCP, Azure and Kubernetes
- Monitoring tools: Datadog, Prometheus, Grafana
- Strong troubleshooting and problem-solving
- Scripting/automation with Python or Bash is a bonus
- Familiarity with ITIL and incident management
π Benefits
- Competitive compensation including equity
- Retirement and ESPP
- Flexible paid time off
- Medical, dental, vision, life, and disability plans
- Fertility benefits and equal paid parental leave
- Professional development stipend and career pathing
- A curated in-office experience fostering community
- Volunteer Week and donation matching
- Employee Resource Groups and inclusive culture
- Great Place to Work culture
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!