About AvePoint
AvePoint is seeking a Site Reliability Engineer (SRE) to support GovTech initiatives in Singapore. This permanent, full-time role sits within our APAC team and focuses on the reliability, scalability, and performance of mission-critical systems.
Role overview
You will design, implement, and operate cloud-based services and containerized workloads, with a strong emphasis on automation and incident response. You will participate in on-call rotations and drive incident response improvements.
What you'll do
- Design, implement, and maintain scalable, highly available systems and services.
- Monitor production systems and respond to incidents to minimize impact.
- Lead incident response, root cause analysis, and post-incident reviews.
- Develop observability with metrics, logs, tracing, and dashboards (Prometheus, Grafana, etc.).
- Automate infrastructure and deployments using IaC (Terraform, CloudFormation).
- Manage CI/CD pipelines and deployment processes.
- Collaborate with software engineering, security, and operations teams.
- Participate in on-call rotations and ensure effective incident communication.
Qualifications
- Strong experience in Linux/Unix environments.
- Hands-on experience with Kubernetes, Docker, and cloud platforms (AWS, GCP, Azure).
- Infrastructure as Code experience with Terraform (or CloudFormation).
- Scripting skills in Python or Bash; familiarity with Go is a plus.
- Observability and monitoring expertise using Prometheus, Grafana, ELK/Elastic stack.
- Security best practices and incident management.
Nice to have
- GovTech or public sector experience.
- Familiarity with AvePoint products and data protection technologies.
- Bachelor's degree in Computer Science or related field.
Benefits
Competitive salary and benefits package, professional development opportunities, health insurance, and flexible work arrangements.
Location
Singapore (APAC)
How to apply
Apply via AvePoint careers page: https://www.avepoint.com/careers/job-detail?gh_jid=6788033