Related skills
looker snowflake sql testing pythonπ Description
- Lead the AI Evaluation team, staffing, coaching, and delivery of evaluation frameworks.
- Oversee AI evaluation lifecycle from pre-launch testing to post-deployment health monitoring.
- Operationalize human-in-the-loop testing and feed reviewer feedback into improvement loops.
- Oversee simulation environments to stress-test LLMs and detect hallucinations.
- Partner with AI Platform & Governance to implement metrics, reporting, and health signals.
- Develop dashboards and reporting to track evaluation coverage, accuracy, and confidence.
π― Requirements
- 7+ years in AI/ML operations, quality, or evaluation; 2+ years people leadership.
- Deep understanding of LLM behavior, prompt testing, and evaluation methodologies.
- Familiarity with human-in-the-loop frameworks and prompt testing tools.
- Strong program management and stakeholder communication skills.
- SQL and Python proficiency; Looker or Snowflake experience.
- Experience collaborating with Engineering, Data Science, and Risk/Compliance on AI initiatives.
π Benefits
- Four days in-office; Fridays from home for those near offices.
- Backup child, elder, and pet care, plus subsidized commuter benefit.
- Competitive salary based on experience.
- 401k match and medical/dental/vision benefits.
- Generous vacation policy and company-wide events.
- Parental leave, Maven program, and community involvement.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Operations Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!