Monitoring and Observability Architect at DriveWealth
DriveWealth is seeking a senior Monitoring and Observability Architect to design, implement, and scale a robust observability platform across our fintech infrastructure. You will collaborate with platform engineering, SRE, and software teams to define metrics, traces, and log strategies, while driving reliability and performance improvements.
Responsibilities
- Design and implement a comprehensive observability strategy across services, databases, and cloud environments.
- Build and maintain monitoring pipelines using Prometheus, Grafana, OpenTelemetry, Jaeger, and related tooling.
- Define SLOs/SLIs and implement alerting and incident response processes to minimize mean time to detect and recover.
- Architect scalable observability for cloud-native services running on Kubernetes across AWS/GCP/Azure.
- Automate instrumentation, dashboards, alert routing, and on-call workflows; continuously improve reliability and performance.
- Lead incident post-mortems and drive reliability improvements with cross-functional teams.
- Mentor engineers and promote best practices for instrumentation, monitoring, and reliability.
Requirements
- 7+ years of experience in DevOps, SRE, or Observability roles.
- Deep hands-on experience with Prometheus, Grafana, OpenTelemetry, and distributed tracing tools (Jaeger, Zipkin, etc.).
- Strong knowledge of cloud platforms (AWS, GCP, or Azure) and Kubernetes-based deployments.
- Proficiency in at least one programming language for instrumentation (Python, Go, or similar).
- Excellent collaboration and communication skills; ability to influence across teams.
Nice to have
- Experience with service meshes (Istio, Linkerd) and security monitoring.
- Familiarity with cost-aware monitoring and observability budgeting.
About DriveWealth
DriveWealth is a fintech company delivering modern investing technology and API-driven brokerage services to clients worldwide. This is a remote-friendly role.
Benefits
- Competitive compensation and equity package.
- Remote-first culture with flexible work hours.
- Comprehensive health benefits and retirement plans.