Senior Site Reliability Engineer, Data Infrastructure

Added
1 hour ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform github actions helm grafana prometheus

πŸ“‹ Description

  • Own the reliability and performance of our Kubernetes-based data platform.
  • Design and operate highly available, multi-region systems with uptime targets.
  • Scale infrastructure, improve deployment pipelines, and harden security posture.
  • Evolve DevSecOps practices while partnering with engineering for reliability from day one.

🎯 Requirements

  • 5+ years in SRE/Platform/Infra roles.
  • Kubernetes and containerized services expertise (cluster design, ops).
  • CI/CD with Argo CD and GitHub Actions.
  • Ownership of prod systems with HA β‰₯99.99%, incident response, SLI/SLO/SLA.
  • Geo-replicated multi-region active-active design (routing, failover, data consistency).
  • Observability with Prometheus, Grafana, OpenTelemetry.

🎁 Benefits

  • Medical, dental, and vision insurance - 100% paid by CoreWeave
  • 401(k) with generous employer match
  • Flexible PTO
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’