Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
rag benchmarking evaluation llm fine-tuningπ Description
- Build LLM-powered evaluation pipelines for AI usage skills at scale
- Own end-to-end evaluation rubric, model application, and bias audits
- Design experiments to determine what good evaluation looks like
- Build RAG pipelines and fine-tuning workflows for evaluation models
- Define benchmarking infrastructure to detect improvements and catch regressions
- Translate model behavior into outcomes for PMs, customers, and candidates
π― Requirements
- Shipped LLM-powered systems in production
- Rigorous evaluation mindset with clear metrics
- Design and apply evaluation methodologies at scale
- Systems thinking across data, model, and serving layers
- Able to explain ML judgments to non-ML engineers
- Nice-to-have: eval frameworks for generative AI or psychometrics
π Benefits
- Equity (stock options) and a comprehensive benefits package
- Hybrid work in Santa Clara, CA
- Opportunity to influence AI evaluation at scale
- Collaborative, fast-paced engineering culture
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!