Added
21 minutes ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
sre cloud infrastructure devops kubernetes incident managementπ Description
- Strategy & Team Leadership: Directly manage and align prioritization of DevOps, SRE, and DBRE infrastructure teams under a unified reliability strategy. Set objectives, drive execution, ensure resources focused on high-impact reliability investments.
- Platform Reliability & Incident Prevention: Conduct ongoing risk assessments of Filevine's platform; identify areas of fragility and drive proactive hardening to reduce unplanned downtime.
- Reliability Metrics & Reporting: Define and track uptime/availability, MTTD, MTTR, and incident frequency. Own reporting to make platform health visible to leadership and product teams.
- Status Page & Incident Communication: Manage updates to status.filevine.com during reliability events; define criteria for posting incidents and ensure timely, accurate updates for customers and internal stakeholders.
- Cross-Functional Alignment: Bridge SRE, Product, Engineering, and customer-facing teams to reflect reliability priorities and translate trends into actionable insights for non-technical stakeholders.
- Infrastructure & Tooling: Evaluate, implement, and manage the reliability and observability stack; drive decisions on monitoring, alerting, test environments, and tooling to scale the platform.
π― Requirements
- 5+ years of experience in SRE, DevOps, platform engineering, or reliability-focused product/program management in SaaS.
- Software Engineering Background: Hands-on experience as a software engineer; comfortable reading code and discussing architecture.
- SRE & Infrastructure Expertise: Strong understanding of site reliability principles, cloud infrastructure, database reliability, container orchestration, and modern DevOps practices.
- Risk Assessment & Data Proficiency: Strong analytical skills with ability to use data sources (monitoring platforms, Pendo, Domo, Salesforce, incident logs) to prioritize reliability by business impact.
- Communication Mastery: Ability to translate complex reliability data into clear narratives for leadership and cross-functional partners; experience leading incident reviews.
- SDLC, Release Lifecycle Knowledge & Education: Deep understanding of SDLC, release protocols, and incident response; ability to identify high-leverage reliability investments. Education: B.S./M.S. in computer science or related field; or equivalent direct work experience with demonstrated track record in software engineering and/or SRE.
π Benefits
- Medical, Dental, & Vision Insurance (for full-time employees)
- Competitive & Fair Pay
- Paid time off policy and comprehensive benefits package
- Maternity & paternity leave (for full-time employees)
- Short & long-term disability
- Opportunity to learn from a dedicated leadership team
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Product Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!