Site Reliability Engineer, Infrastructure - Analytics Platform

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform snowflake kubernetes ci/cd kafka

๐Ÿ“‹ Description

  • Own infrastructure lifecycle: provisioning, upgrades, scaling, decommissioning (IaC-first)
  • Operate and scale ClickHouse clusters: sharding, replication, tuning
  • Operate Kafka ingestion backbone; improve throughput, lag, backpressure, recovery
  • Improve latency and reliability for data-heavy serving and query workloads
  • Build and maintain monitoring/alerting: SLIs/SLOs, dashboards, runbooks
  • Define, implement, and improve incident response standards and on-call practices

๐ŸŽฏ Requirements

  • Track record owning production infra for data-heavy, low-latency systems end to end
  • Strong hands-on experience operating ClickHouse, Kafka, and related data systems
  • Practical experience with Snowflake workflows and cross-system data architecture
  • Ability to define operational standards (runbooks, incident process) and enforce
  • Strong operational experience with Kubernetes, Terraform, and cloud infra
  • Excellent communication and collaboration across engineering and research teams
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’