Related skills
documentation datadog grafana prometheus splunk📋 Description
- 24x7 monitoring of cloud SaaS infrastructure and apps.
- Incident management as first responder; escalate to SMEs.
- Deployments and change management; coordinate with SMEs.
- Troubleshooting with diagnostics to resolve issues in real time.
- Define/refine SLAs, SLOs, SLIs; run regular health checks.
- Analysis and reporting: review incident data and create shift reports.
🎯 Requirements
- Proficiency with Prometheus, Grafana, Datadog, or Splunk.
- Remain composed in high-stakes incidents and resolve quickly.
- Strong verbal and written communication for incident reporting.
- Education: B.Sc IT, B.Sc Computers, BCA or equivalent.
- Experience: 1-3 years in reliability ops or 24x7 SaaS/cloud support.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!