Forward Deployed Site Reliability Engineer (TS/SCI Required)

Added
21 days ago
Type
Full time
Salary
Salary not provided

Related skills

docker terraform aws grafana docker compose

πŸ“‹ Description

  • Define and track SLIs/SLOs for services in the customer env.
  • Use error budgets to drive reliability conversations with the Arlington team.
  • Eliminate toil by automating repetitive tasks in the secure enclave.
  • Conduct post-incident reviews and root-cause analysis with the team.
  • Own on-site observability: dashboards, alerts, and logs with LGTM stack.
  • Lead on-site incident response: triage, containment, and customer comms.

🎯 Requirements

  • 5+ years in SRE/production ops or related infra role.
  • Proven experience defining/tracking SLIs, SLOs, and error budgets.
  • Hands-on Docker, Docker Compose, and AWS in production.
  • Linux/Unix admin in constrained environments.
  • Terraform for infra provisioning with policy guardrails.
  • LGTM stack (Grafana, Loki, Prometheus/Mimir) and strong incident response.

🎁 Benefits

  • Equal opportunity employer.
  • Reasonable accommodation during the hiring process.
  • Opportunity to work on mission-critical, national-security programs.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’