Lead Cloud Infrastructure Engineer / Site Reliability Engineer (SRE)

Added
1 minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

gitops terraform python kubernetes eks

📋 Description

  • Collaborate with software teams to ensure reliability, performance, and security of Federal region infrastructure.
  • Design, deploy, and scale AI/ML/LLM infra across AWS, Azure, or GCP.
  • Manage Kubernetes (EKS/AKS/GKE) for AI services and data pipelines.
  • Build/automate data and model pipelines for AI workloads using Terraform, Python, CI/CD.
  • Use GitOps, CI/CD, Docker, Kubernetes to streamline ML/LLM tasks.
  • Implement monitoring/observability with Prometheus, Grafana, ELK/EFK, Langfuse.

🎯 Requirements

  • Bachelor’s or Master’s in CS/Engineering or equivalent.
  • 8+ years in SRE, DevOps, Platform, MLOps, or Cloud Infra.
  • 4+ years production Kubernetes (EKS/GKE/AKS) and Docker.
  • Strong Python; Bash/Go/PowerShell proficient.
  • IaC with Terraform or CloudFormation.
  • U.S. citizenship at hire; reside in contiguous US; willing to undergo a background check.

🎁 Benefits

  • Equity and additional benefits awarded.
  • Remote-friendly, globally distributed team with work-from-home options.
  • Opportunity to work on FedRAMP-compliant cloud infrastructure.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to DevOps Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related DevOps Jobs

See more DevOps jobs →