Staff/Sr Software Engineer, Compute Capacity

Added
11 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

bigquery aws sql grafana prometheus

πŸ“‹ Description

  • Build and operate data pipelines ingesting occupancy, utilization, and cost data into BigQuery.
  • Develop observability infra (Prometheus, Grafana) and alerts for fleet health.
  • Instrument compute efficiency across training, inference, and eval workloads; build benchmarking.
  • Build internal tooling for capacity planning, workload attribution, and cluster debugging.
  • Operate Kubernetes-native systems at scale; deploy data collectors; manage labeling and taints.
  • Normalize and reconcile data across AWS, GCP, and Azure billing exports and telemetry formats.

🎯 Requirements

  • 5+ years of software engineering experience with production systems.
  • Kubernetes fluency at operational depth β€” scheduling, taints, labels, node management.
  • Data pipeline engineering: pipelines, BigQuery, schema management, streaming, SLOs.
  • Observability tooling: Prometheus, PromQL, Grafana; recording rules.
  • Python and SQL at production quality; BigQuery SQL.
  • Familiarity with AWS, GCP, or Azure at infrastructure level; multi-cloud experience.

🎁 Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • Collaborative office space.

πŸ›ƒ Visa sponsorship

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’