Design, build, and operate reliable and performant systems used across engineering.
Identify and fix performance bottlenecks; ensure scalability of infrastructure.
Dig deep to resolve complex issues.
Continuously improve automation; improve internal tooling and developer experience.
Contribute to incident response, postmortems, and development of best practices for reliability and scalability.

🎯 Requirements

4+ yrs in relevant industry exp; 2+ yrs leading large scale projects/teams.
Distributed systems at scale with reliability, scalability, security.
Proven reliability/production engineer experience in fast-growing companies.
Cloud infra (AWS, GCP, Azure) and Terraform.
Kubernetes and container orchestration.
Observability tools: Datadog, Prometheus, Grafana, ELK.
Microservices architecture and service mesh familiarity.
Security best practices in cloud environments.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot