Intermediate Site Reliability Engineer (SRE) – AI Reliability & Automation

Added
10 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog azure java docker terraform

📋 Description

  • AI-Driven Observability: ML-based anomaly detection and pattern recognition.
  • Intelligent Automation: Develop event-driven workflows and self-healing systems.
  • Predictive Reliability: Time-series forecasting to anticipate failures.
  • Software Engineering for Resilience: Build scalable, fault-tolerant cloud-native systems.
  • Team Enablement: Run AIOps workshops and promote AI maturity.

🎯 Requirements

  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, LangChain, Vector store, RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
  • Databases: SQL Server, PostgreSQL, MySQL
  • 5+ years' experience in software engineering
  • Experience with SRE principles and AI/ML in production

🎁 Benefits

  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →