Added
18 days ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
ai cross-functional collaboration llmsπ Description
- Define evaluation frameworks for agentic AI systems (quality, safety, latency)
- Evaluate frontier and fine-tuned models for quality, latency, and cost
- Partner with product, data science, and engineering to set launch criteria
- Analyze production issues; identify root causes and prioritize fixes
- Build visibility into agent performance via metrics and monitoring
- Based in Bellevue, WA or Menlo Park, CA; in-office 3 days/week
π― Requirements
- Deep experience measuring agentic/ML quality with evaluation frameworks, data, scorecards
- Experience evaluating large language models; tradeoffs in performance, cost, latency
- Proven ability to analyze production issues and lead cross-team improvements
- Comfortable collaborating with engineers, data scientists, and product partners
- Experience with AI evaluation/observability tools or regulated environments (nice to have)
π Benefits
- Challenging, high-impact work to grow your career
- Performance-based comp with equity and bonuses
- Comprehensive benefits including health insurance for you and dependents
- Lifestyle wallet for wellness and learning
- Employer-paid life and disability insurance and mental health benefits
- Time off, holidays, parental leave, and more
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!