Senior Site Reliability Engineer

Added
2 hours ago
Type
Full time
Salary
Salary not provided

Related skills

rust java prometheus python kubernetes

πŸ“‹ Description

  • Act as Incident Commander during major incidents; run war rooms.
  • Instrument code to expose metrics and traces; define SLOs with owners.
  • Write prod-ready code (Java/Go/Python) for tooling and self-healing.
  • Collaborate with Product Eng to bake reliability and observability from day one.
  • Analyze perf and traffic; run load tests and chaos experiments.

🎯 Requirements

  • 4+ years in SRE/Backend; strong Java/Go/Python/Rust coding.
  • Deep understanding of distributed systems and microservices.
  • GCP/AWS/Azure; Kubernetes (GKE/EKS) prod workloads.
  • Observability design with OpenTelemetry, Prometheus, Datadog, or SigNoz.
  • PostgreSQL/MySQL and Kafka/RabbitMQ in high-throughput environments.

🎁 Benefits

  • Culture - People-first, inclusive environment.
  • Learning - Regular internal technical talks and growth.
  • Compensation - Attractive salary, pension, health insurance, annual bonus.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’