Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Own and evolve evaluation strategy for LLM- and agent-based systems (golden data, A/B tests).
Benchmark foundation model performance within Caseware’s domain; identify gaps and risks.
Lead RAG pipeline design: embeddings, retrieval strategies, reranking, quality metrics.
Design feedback/evaluation pipelines linking user behavior to improvements.
Define guardrails for agentic systems: schema validation, content filtering, tool governance.
Establish approval gates and rollout controls: feature flags, staged deployments, kill switches.

Strong data science foundation with Python, SQL, statistics, and experiment design.
Deep hands-on experience with LLMs, prompting strategies, and agent reasoning patterns.
Practical expertise with embeddings, vector databases, retrieval metrics, and reranking approaches.
Proven experience designing or operating evaluation frameworks for generative AI or agentic systems ( automated and human-in-the-loop ).
Strong understanding of AI reliability, safety, and governance (guardrails, validation, monitoring, change control).
Nice-to-have: LangChain or similar.

Staff Developer – AI Evaluation & Reliability

Meet JobCopilot: Your Personal AI Job Hunter