Related skills
sql dbt apis observability llmπ Description
- Design and build Hazel's evals platform end-to-end (scoring, datasets, CI/CD)
- Build observability for AI quality: hallucinations, accuracy, latency, and cost signals
- Architect data pipelines turning advisor interactions into evaluation datasets with privacy controls
- Build and steward golden datasets with SMEs and advisors to define eval criteria
- Develop LLM verification agents to catch hallucinations, computational errors, and compliance violations
- Integrate evals into deployment pipelines to run regression tests before shipping
π― Requirements
- 8+ years of engineering experience, with at least 2 years in evaluation infra or ML platforms
- Deep familiarity with AI evaluation methods (RAG, docs, model assessment, human eval)
- Experience designing and curating golden datasets β sampling, inter-rater agreement, versioning, edge cases
- Comfort across the stack: data engineering (SQL, dbt, warehouses) and API/backend integration
- Strong communication; translate domain needs into precise, automatable eval criteria
- Bias toward shipping; build tools engineers actually want to use
π Benefits
- Hybrid work schedule for most positions
- Office spaces in Culver City, SF, and Dallas
- Competitive pay and equity for eligible positions
- Premium healthcare, dental, and vision insurance plans
- 401k with a 4% match and immediate vesting
- One month work from anywhere policy
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!