Related skills
azure aws grafana prometheus pythonπ Description
- Improve reliability, scalability, performance, and observability for JFrog SaaS.
- Define SLOs/SLIs, analyze failures, and support capacity planning.
- Support day-to-day ops of multi-cloud, Kubernetes-based SaaS.
- Build and enhance internal services/tools to reduce toil via automation.
- Develop Python/Go automation to improve deployment safety and incident visibility.
- Run PoCs and drive agentic automation using an ADK/agent framework.
π― Requirements
- 4+ years in SRE, DevOps, or production engineering.
- Kubernetes (Docker) and at least one cloud provider (AWS, GCP, or Azure).
- SRE Fundamentals: SLO/SLI, alerting, incident response, postmortems.
- Development: Python or Go for automation and internal tools.
- Observability: metrics/logs/traces with Prometheus, Grafana, OpenTelemetry.
- Incident & Resilience: strong incident response; DR readiness.
- CI/CD: Jenkins, ArgoCD, or equivalent.
- Soft Skills: documentation and collaborative problem solving.
π Benefits
- Hybrid work model with in-office days in Bangalore.
- Opportunity to work on a global SaaS platform.
- Collaborative, impact-focused team culture.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!