Site Reliability Engineer ( Must have an active TS SCI with POLY)

Added
12 days ago
Type
Full time
Salary
Salary not provided

Please note; This position requires a TS SCI with POLY. ONLY candidates who have the appropriate clearance level will be considered.

This role supports a critical mission and requires an active U.S. Government Security Clearance at the TS/SCI level with a required polygraph.

We’re looking for a full-time Observability Engineer (OE) who is driven by understanding not just what is happening in complex, cloud-native systems—but why. You’ll be part of a highly collaborative data collection and software development team responsible for ensuring services meet the reliability, performance, and uptime expectations of our customers.

This is an environment where systems evolve quickly, and attention to detail matters. You’ll help us stay ahead by keeping a constant pulse on capacity, performance, and cost—while continuously improving how we see, understand, and respond to system behavior.

As an OE, you’ll design and build monitoring and observability solutions that give teams deep visibility into operational health. Your work will directly support mission success by enabling faster troubleshooting, stronger system insight, and better customer outcomes across the entire technology stack.

What You’ll Do

Define and uphold standards for monitoring reliability, availability, performance, and maintainability of sponsor-owned systemsDesign and architect operational solutions that support both applications and infrastructureDrive service acceptance by introducing new operational processes, monitoring strategies, and automation to reduce risk and repeat issuesPartner closely with service and product owners to define key performance indicators (KPIs) and identify meaningful trendsProvide deep, hands-on troubleshooting support for production issuesWork with service owners to quickly identify root causes and restore services during performance or availability incidentsBuild or leverage tools that correlate data across multiple systems to accelerate root-cause analysisCoordinate with the sponsor during major incidents, large-scale deployments, and SecOps user support activities

Required

Active/current TS/SCI with required polygraphBachelor’s degree in Computer Science or a related field5+ years of relevant engineering experienceHands-on experience with Kubernetes, Docker, Helm, and CI/CD pipelines (e.g., Jenkins or Concourse)Familiarity with distributed version control systems such as GitExperience working in AWS cloud environmentsProven experience implementing monitoring and observability solutions across complex systems and data feedsProficiency in Python and Java scriptingAdvanced knowledge of Unix/Linux, with strong command-line comfortWillingness to work onsite full time and participate in on-call rotationsA collaborative mindset and a sense of ownership when things go wrong

Nice to Have

Experience with additional cloud providers beyond AWSFamiliarity with AWS CloudWatch or other native monitoring toolsExperience using Prometheus, Grafana, or similar tools for ETL pipelines, APIs, servers, networks, C2S services, and AI/ML platformsStrong understanding of networking fundamentalsExperience with incident and problem management processesRoot Cause Analysis (RCA) experienceExposure to ETL workflows and data pipelinesOrganized, detail-oriented, and comfortable documenting and communicating workWillingness to step into leadership roles during incidents—guiding others and driving issues to resolution

Why This Role Matters

Your work will help ensure systems supporting critical missions remain reliable, observable, and resilient. If you thrive in environments where accountability matters, collaboration is essential, and your impact is felt immediately—this role offers meaningful work with real-world consequences.

Salary Range: 185-225K

Benefits

At Aperio Global, we understand the value of investing in our most important asset—our employees. That's why we have crafted a comprehensive benefits package designed to help you make the best decision for yourself, your family and your lifestyle. For additional details, contact our talent acquisition team.

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance (Basic, Voluntary & AD&D)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Short Term & Long Term Disability
  • (and much more)

Aperio Global is committed to providing equal employment opportunities (EEO) to all employees and applicants. Employment decisions are made without discrimination or harassment, in accordance with applicable federal, state, and local laws. We adhere to all legal requirements prohibiting discrimination based on race, color, religion, sex, national origin, age, disability, genetic information, veteran status, or any other characteristic protected by law.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →