Integration Reliability Engineer, Technical Operations, Connect at Stripe
Overview
Stripe is seeking an Integration Reliability Engineer within the Technical Operations team for Connect. You will focus on the reliability, performance, and resilience of Stripe Connect integrations across partners and merchants.
Responsibilities
- Design, implement, and maintain monitoring, alerting, and observability for Connect integrations.
- Lead incident response and post-incident reviews to drive durable improvements.
- Collaborate with Platform and Partnerships teams to automate reliability and reduce toil.
- Develop runbooks, playbooks, and disaster recovery strategies for Connect workloads.
- Improve deployment and release processes to increase reliability.
Requirements
- 3+ years of experience in site reliability engineering or integration-focused software engineering.
- Strong programming/scripting skills (e.g., Python, Go, or shell).
- Experience with cloud infrastructure (AWS, GCP) and containerization (Kubernetes).
- Familiarity with monitoring tools (Prometheus, Grafana) and incident management processes.
- Bachelor's degree in Computer Science or a related field.
Nice to have
- Experience with Stripe Connect or payment platforms is a plus.
- Excellent collaboration and communication skills.