Related skills
datadog terraform grafana prometheus kubernetes📋 Description
- Design, build, and operate reliable and performant systems used across engineering.
- Identify and fix performance bottlenecks; ensure scalability of infrastructure.
- Dig deep to resolve complex issues.
- Continuously improve automation; improve internal tooling and developer experience.
- Contribute to incident response, postmortems, and development of best practices for reliability and scalability.
🎯 Requirements
- 4+ yrs in relevant industry exp; 2+ yrs leading large scale projects/teams.
- Distributed systems at scale with reliability, scalability, security.
- Proven reliability/production engineer experience in fast-growing companies.
- Cloud infra (AWS, GCP, Azure) and Terraform.
- Kubernetes and container orchestration.
- Observability tools: Datadog, Prometheus, Grafana, ELK.
- Microservices architecture and service mesh familiarity.
- Security best practices in cloud environments.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!