Overview
Alchemy is seeking a Site Reliability Engineer to join our team in Bucharest, Romania. This is a full-time role focused on building and maintaining reliable, scalable infrastructure to power our products. You will collaborate with software engineers, platform teams, and security to improve system reliability, performance, and incident response.
Responsibilities
- Design, implement, and operate highly available and scalable services.
- Develop and maintain monitoring, alerting, and incident response processes.
- Build and manage resilient infrastructure on cloud platforms.
- Automate deployment, provisioning, and maintenance using infrastructure-as-code tools.
- Participate in on-call rotations, respond to incidents, and conduct post-incident reviews.
- Collaborate with development teams to improve performance, reliability, and security.
Qualifications / Requirements
- Experience in Site Reliability Engineering, Production operations, or DevOps.
- Strong Linux skills and networking fundamentals; familiarity with observability tools (e.g., Prometheus, Grafana, ELK/EFK).
- Hands-on experience with Kubernetes and container orchestration; cloud provider experience (AWS/Azure/GCP).
- Scripting and automation skills (e.g., Python, Bash).
- Effective communication and problem-solving abilities.
Nice-to-have
- Experience with Terraform, Ansible, Helm, and CI/CD pipelines.
Benefits
- Competitive salary and benefits package.
- Opportunities for growth and professional development.
- Collaborative, mission-driven team culture.