Added
4 days ago
Type
Full time
Salary
Salary not provided

Related skills

sql qa prompt engineering playwright rag

πŸ“‹ Description

  • Design/test strategies for LLM features, prompts, and hallucination detection
  • Build automated evaluation pipelines (eval sets, golden data, LLM-as-judge) to catch quality regressions
  • Black-box and exploratory testing of AI features on web/mobile for accuracy, safety, edge cases
  • Define AI quality metrics and release readiness thresholds
  • Collaborate with engineers, PMs, ML/AI engineers, and clinicians to define what good AI looks like
  • Investigate and triage AI failure modes: model, prompt, retrieval, integration issues

🎯 Requirements

  • 5+ years software QA with at least 1 year testing LLM/AI features
  • Strong QA principles, test case creation/documentation for deterministic and non-deterministic systems
  • Hands-on LLM tooling: prompt engineering, RAG, eval frameworks, LLM APIs (OpenAI, Anthropic)
  • Experience designing automated qualitative evaluation (LLM-as-judge, rubric scoring, golden datasets)
  • Proficiency with test automation tools, emphasis on Playwright
  • Strong SQL for data validation and test data creation

🎁 Benefits

  • Medical, Dental, Vision Coverage, with option to extend to your dependents
  • Company-sponsored short-term insurance
  • Fully-paid 8 week parental leave after 6 months of employment
  • Company-sponsored 401k after 3 months of employment
  • Unlimited vacation for salaried roles
  • Bi-annual company offsites and work from home stipend
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’