Site Reliability Engineer (SRE)

Added
8 days ago
Type
Full time
Salary
Salary not provided

Related skills

java aws prometheus python kubernetes

๐Ÿ“‹ Description

  • Build and maintain highly reliable, scalable systems.
  • Design dashboards for OS/platform and app metrics (RED/USE).
  • Establish SLIs/SLOs and error budgets for services.
  • Implement performance monitoring and alerting to prevent issues.
  • On-call rotations and incident response leadership.
  • Drive infrastructure automation and deployment.

๐ŸŽฏ Requirements

  • 5+ years in SRE/DevOps or related field.
  • Proficiency in at least two languages Python, Shell, Java, NodeJS.
  • Cloud: AWS, GCP, or Azure.
  • Docker and Kubernetes containerization.
  • Monitoring with Prometheus, Grafana, ELK stack.
  • Infrastructure as Code: Terraform, Ansible, or similar.
  • Git version control.

๐ŸŽ Benefits

  • Blameless post-mortems to learn from failures.
  • Automation-first culture; toil reduction.
  • Growth opportunities in large-scale systems.
  • Work-life balance and sustainable on-call practices.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to DevOps Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related DevOps Jobs

See more DevOps jobs โ†’