Monitor, evaluate, and optimize AI/LLM workflows in production environments. Ensure reliable, efficient, and high-quality AI system performance by building out an LLM Ops platform that is self-serve for the engineering and data science departments.
Key Responsibilities:- Collaborate with data scientists and software engineers to integrate an LLM Ops platform (Opik by CometML) for existing AI workflowsIdentify valuable performance metrics (accuracy, quality, etc) for AI workflows and create on-going sampling evaluation processes using the LLM Ops platform that alert when metrics drop below thresholdsCross-team collaboration to create datasets and benchmarks for new AI workflowsRun experiments on datasets and optimize performance via model changes and prompt adjustmentsDebug and troubleshoot AI workflow issuesOptimize inference costs and latency while maintaining accuracy and quality Develop automations for LLM Ops platform integration to empower data scientists and software engineers to self-serve integration with the AI workflows they build Requirements:- Strong Python programming skillsExperience with generative AI models and tools (OpenAI, Anthropic, Bedrock, etc)Knowledge of fundamental statistical concepts and tools in data science such as: heuristic and non-heuristic measurements in NLP (BLEU, WER, sentiment analysis, LLM-as-judge, etc), standard deviation, sampling rate, and a high level understanding of how modern AI models work (knowledge cutoffs, context windows, temperature, etc)Familiarity with AWSUnderstanding of prompt engineering conceptsPeople skills: you will be expected to frequently collaborate with other teams to help to perfect their AI workflowsExperience Level 4-7 years of experience in LLM/AI Ops, MLOps, Data Science, or MLE