Related skills
python pandas pytorch elasticsearch ragπ Description
- Define evaluation strategy for conversational/agentic search (offline/online).
- Lead quality metrics and decision frameworks for RAG, agents, tooling.
- Build and compare retrieval and re-ranking: dense/sparse, vector search, context enrichment.
- Turn experiments into product decisions: models, routing, tool exposure.
- Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails.
- Mentor data scientists and engineers in experiment design and evaluation for LLM-powered systems.
π― Requirements
- 8+ years of applied DS/ML with IR, NLP, ranking, RAG, or LLM-powered products.
- Proven track record defining/evaluating production AI/ML systems (offline/online).
- Hands-on Python, PyTorch/Transformers, Pandas, notebooks, reproducible experiments.
- Deep understanding of retrieval systems: dense/sparse, vector search, re-ranking, metrics like nDCG, MRR.
- Experience collaborating with engineering to move from prototype to production (telemetry, dashboards, CI).
- Elasticsearch experience or similar search/distributed data systems; ES|QL a plus.
π Benefits
- Competitive pay based on the work you do.
- Health coverage for you and your family in many locations.
- Flexible locations and schedules for many roles.
- Generous number of vacation days each year.
- Company matches up to $2000 for donations and service.
- Up to 40 hours per year to use toward volunteer projects.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Data Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!