Intermediate Site Reliability Engineer (SRE) – AI Reliability & Automation
Related skills
datadog azure java docker terraform📋 Description
- AI-Driven Observability: ML-based anomaly detection and pattern recognition.
- Intelligent Automation: Develop event-driven workflows and self-healing systems.
- Predictive Reliability: Time-series forecasting to anticipate failures.
- Software Engineering for Resilience: Build scalable, fault-tolerant cloud-native systems.
- Team Enablement: Run AIOps workshops and promote AI maturity.
🎯 Requirements
- Languages: Python, Java, Bash, Terraform
- Platforms: Azure, Kubernetes, Docker
- Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
- ML/AI: MCP framework, AI agents, LangChain, Vector store, RAG
- CI/CD: Jenkins, ArgoCD, Spinnaker
- Databases: SQL Server, PostgreSQL, MySQL
- 5+ years' experience in software engineering
- Experience with SRE principles and AI/ML in production
🎁 Benefits
- Retirement Plan Matching
- Flexible Paid Time Off
- Wellness Support Programs and Resources
- Parental & Caregiver Leaves
- Fertility & Adoption Support
- Continuous Development Support Program
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!