Related skills
datadog aws grafana prometheus zabbix๐ Description
- Ensure high uptime in SaaS by monitoring infra, apps, and networks with Zabbix, Grafana, Prometheus.
- Identify anomalies, thresholds, and bottlenecks to prevent incidents.
- Respond to alerts in real time; triage across Windows/Linux servers, apps, and networks; validate service health.
- Validate application availability via endpoints, API checks, and health checks.
- Run routine operations: health checks, patches, backups, DR using AWS tools; monitor resources and costs.
- Contribute to SOPs, runbooks, and knowledge base; participate in Change Management processes.
๐ฏ Requirements
- Associate or Bachelor's in IT, Networking, or related field.
- 2โ5 years in a NOC, Cloud Ops, or Network Support.
- Strong networking: TCP/IP, DNS, DHCP, VPN.
- Solid Linux and Windows OS knowledge.
- Experience with monitoring tools: Zabbix, Grafana, Prometheus, Sumo Logic, Datadog, ELK.
- ITIL: Incident, Change, and Problem Management.
- Troubleshoot SaaS performance and cloud-based outages.
- Willing to work a 24/7 shift-based model.
- Effective incident reporting and documentation.
- Working knowledge of AWS.
๐ Benefits
- Inclusive, diverse team culture.
- Equal opportunity employer.
- Opportunities to grow in a 24/7 NOC.
- Competitive benefits package.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Operations Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!