Braintrust

11-50 employees
5 jobs posted

View company profile →

Please mention that you found this job on empllo.com. Thanks & good luck!

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills
Tailors your resume and cover letter automatically
Works 24/7—so you don't have to

Activate JobCopilot

Follow us on LinkedIn!

Eval Engineer

Added

28 days ago

Location

🇺🇸 San Francisco

Type

Full time

Salary

Salary not provided

Related skills

python evaluation llm agents datasets

📋 Description

Design and run evaluations of new AI capabilities
Compare frontier models, agent systems, and tool workflows
Turn emerging ideas into measurable benchmarks
Define datasets, tasks, and scoring logic for experiments
Design realistic workloads that reflect production environments
Create tests that expose failure modes and edge cases

🎯 Requirements

Built or contributed to evaluation systems for LLM or agent applications
Designed experiments comparing models, prompts, or AI architectures
Written Python code to run tests across models or APIs
Built datasets or scoring logic for AI quality measurement
Investigated model failures or unexpected behaviors
Published technical blog posts, research notes, or engineering write-ups

🎁 Benefits

Medical, dental, and vision insurance
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity
AI Stipend

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot