Added
4 days ago
Type
Full time
Salary
Salary not provided

Related skills

datadog aws grafana prometheus zabbix

๐Ÿ“‹ Description

  • Ensure high uptime in SaaS by monitoring infra, apps, and networks with Zabbix, Grafana, Prometheus.
  • Identify anomalies, thresholds, and bottlenecks to prevent incidents.
  • Respond to alerts in real time; triage across Windows/Linux servers, apps, and networks; validate service health.
  • Validate application availability via endpoints, API checks, and health checks.
  • Run routine operations: health checks, patches, backups, DR using AWS tools; monitor resources and costs.
  • Contribute to SOPs, runbooks, and knowledge base; participate in Change Management processes.

๐ŸŽฏ Requirements

  • Associate or Bachelor's in IT, Networking, or related field.
  • 2โ€“5 years in a NOC, Cloud Ops, or Network Support.
  • Strong networking: TCP/IP, DNS, DHCP, VPN.
  • Solid Linux and Windows OS knowledge.
  • Experience with monitoring tools: Zabbix, Grafana, Prometheus, Sumo Logic, Datadog, ELK.
  • ITIL: Incident, Change, and Problem Management.
  • Troubleshoot SaaS performance and cloud-based outages.
  • Willing to work a 24/7 shift-based model.
  • Effective incident reporting and documentation.
  • Working knowledge of AWS.

๐ŸŽ Benefits

  • Inclusive, diverse team culture.
  • Equal opportunity employer.
  • Opportunities to grow in a 24/7 NOC.
  • Competitive benefits package.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Operations Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Operations Jobs

See more Operations jobs โ†’