Added
1 minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
grafana prometheus python kubernetes goπ Description
- Design and build scalable telemetry ingest and storage pipelines across multi-cluster infra.
- Own and evolve core observability platforms; drive migrations and improvements.
- Build instrumentation libraries, SDKs, and integrations for high-quality telemetry.
- Drive alerting and SLO infrastructure to monitor reliability targets with minimal noise.
- Reduce MTTR via cross-signal correlation and unified query interfaces.
- Partner with Research, Inference, Product, and Infrastructure teams to tailor observability.
π― Requirements
- 10+ years of experience building and operating large-scale observability infrastructure.
- Deep experience with at least one observability signal area (metrics, logs, tracing, or analytics).
- Understand high-throughput data pipelines, columnar storage, and telemetry data at scale.
- Experience operating or building on Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar.
- Strong proficiency in Python, Rust, or Go and ability to collaborate effectively.
- Excited about building foundational infrastructure and comfortable with ambiguous, high-impact challenges.
π Benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Lovely office space to collaborate with colleagues
π Visa sponsorship
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!