Related skills
sql python pytest openai evals rag evaluatorsπ Description
- Converse with the model on real-world prompts
- Verify factual accuracy and logical soundness
- Design and run test plans and regression suites
- Build rubrics and pass/fail criteria
- Capture reproducible error traces and root causes
- Suggest improvements to prompts, guardrails, and metrics
π― Requirements
- Advanced degree in CS, data science, linguistics, or statistics
- Shipped QA for ML/AI systems; safety/red-team experience
- Test automation frameworks (e.g., PyTest)
- Hands-on with LLM eval tooling (OpenAI Evals, RAG evaluators, W&B)
- Strong rubric design, adversarial testing, regression at scale
- Clear communication and
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!