Related skills
gitlab datadog javascript docker bash๐ Description
- Manage incident war rooms, escalate and coordinate response for critical alerts.
- Perform deep-dive troubleshooting for application, transaction, and technical issues.
- Research Fireblocks blockchain workflows, identify optimizations, and improve monitoring.
- Identify root causes for incidents and orchestrate outages with multiple teams.
- Improve alerting for infrastructure, services, and business logic.
- Document actions in runbooks and automate with Python, Lambda, shell scripts, ArgoCD, and Ansible.
๐ฏ Requirements
- 3+ years of experience as an SRE incident response engineer.
- Curious, self-motivated, responsible, and production-awareโable to move a project from POC to production and communicate decisions.
- Experience with Python, JavaScript, Bash.
- 3+ years of experience with alerting and monitoring systems such as DataDog, Coralogix, Splunk, New Relic, Prometheus.
- Experience working with Linux systems from kernel to shell.
- Cloud platforms such as AWS, Google Cloud, and Azure.
- Docker, Kubernetes, and Helm.
- Git, Bitbucket, GitLab.
- Strong analytical and troubleshooting skills, with good verbal and written communication.
๐ Benefits
- Remote-friendly with a global team and follow-the-sun coverage.
- Generous benefits package and equity opportunities.
- Remote-first culture with opportunities for professional growth in SRE.
- Exposure to cutting-edge observability and security tooling.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!