Site Reliability Engineer

Added
22 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog terraform github actions python kubernetes

πŸ“‹ Description

  • Define and maintain SLOs/SLAs balancing UX with velocity
  • Implement monitoring and alerting in Datadog for prod issues
  • Build resilient architectures that gracefully handle failures
  • Establish error budgets for feature velocity vs stability
  • Lead incident response as primary on-call for infrastructure
  • Conduct blameless post-mortems to prevent recurrence

🎯 Requirements

  • Terraform/Infrastructure as Code (Terraform or CloudFormation)
  • Kubernetes expertise: networking, storage, security contexts
  • Python or Go for tooling and automation
  • CI/CD pipelines and deployment strategies (GitHub Actions, GitLab CI, Jenkins)
  • Observability: Datadog, Prometheus, Grafana
  • Linux/Unix admin + cloud providers (AWS, GCP, Azure)

🎁 Benefits

  • 401K and health/dental/vision benefits
  • Flexible remote work with home office stipend
  • Company laptop and role-specific tech
  • Hybrid NYC office with amenities
  • Competitive PTO and team socials
  • Unlimited professional development fund
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’