Senior Site Reliability Engineer

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

docker ansible terraform linux prometheus

πŸ“‹ Description

  • Design, implement, and operate scalable production services.
  • Build alerting pipelines, dashboards, and SLO monitoring.
  • Lead end-to-end incident response and post-mortems.
  • Develop and extend IaC coverage; build internal tooling.
  • Mentor SRE I/II through code reviews and knowledge sharing.
  • Apply LLM-driven log analysis and AI tooling to incidents.

🎯 Requirements

  • 6+ years in SRE/Platform Engineering with reliability programs
  • Kubernetes and Docker in production
  • Advanced Linux troubleshooting: kernel internals, TCP/IP, DNS, load balancers
  • Python for automation and Bash scripting
  • Ansible and Terraform or Pulumi; mastery of Icinga, Prometheus, Grafana
  • Understanding of LLMs, embeddings, and ML pipelines for AI-ops

🎁 Benefits

  • Competitive health benefits
  • Retirement savings options (e.g., 401k)
  • Equity grants and employee stock purchase plan
  • Paid time off and parental leave
  • Diversity and inclusion programs
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’