Senior Site Reliability Engineer

Added
11 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

react terraform aws python kubernetes

πŸ“‹ Description

  • Own fleet reliability for Portal's SaaS infra, incl. LLM workflows.
  • Define SLOs and capacity plans to scale our product.
  • Architect infra on GCP and AWS using Terraform for AI workloads.
  • Drive incident management, on-call, postmortems; enable self-healing.
  • Lead fullstack reliability across TypeScript, React, Python.
  • Mentor engineers and shape infra roadmap with AI features.

🎯 Requirements

  • 5+ years operating cloud infra (GCP/AWS), with Terraform and Kubernetes.
  • Experience with LLM-based systems, RAG pipelines, or agentic workloads.
  • Distributed systems: consistency, availability, partition tolerance.
  • Proficient in TypeScript, Java, Go, or Python; navigate large codebases incl. AI PRs.
  • Build automation to prevent operational issues.
  • Clear communicator; write postmortems that drive change.

🎁 Benefits

  • Health insurance
  • Six-month paid parental leave
  • 401(k) retirement plan
  • Monthly meal allowance
  • 23 paid days off
  • Paid holidays and sick leave
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’