Replit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide and over 500,000 business users, Replit is democratizing software development by removing traditional barriers to application creation.

Replit is redefining how software is built, and who gets to build it. Our mission is to achieve Autonomy for All: making programming accessible, collaborative, and powered by AI. To realize this vision, we are building a brand that is as iconic, inventive, and human as the product itself.

You'll directly impact Replit's AI agent—the core of our product strategy—by defining how we measure success, designing experiments that drive improvements, and turning agent trace data into actionable insights for the AI team and company leadership.

You will:

Design and analyze experiments to measure agent improvements—from model changes to UX variations—with statistical rigor and practical tradeoffs.
Define success metrics that connect agent trace data (prompts, responses, code changes, execution outcomes) to user outcomes like successful deploys, retention, and revenue.
Build the semantic layer for agent data in partnership with data engineering—defining the tables, metrics, and models that enable self-serve analysis across the AI team.
Surface insights from trace analysis that identify failure modes, successful patterns, and opportunities to improve agent effectiveness.
Partner with AI engineering, product, and leadership to translate data into roadmap decisions; you'll have a seat at the table for critical agent strategy discussions.
Create dashboards and reporting that surface agent performance metrics (task completion, latency, quality scores, user satisfaction) for the AI team and executives.

Examples of what you could do:

Design an experiment to measure whether a new model improves task completion rates, accounting for user heterogeneity and novelty effects.
Build outcome-linked data models that connect agent trajectories to downstream success (deployments, user satisfaction, retention).
Develop evaluation frameworks for agent quality that can be reused as benchmarks—similar to how LLMs have standard evals.
Investigate why agent performance varies across coding tasks, languages, or user segments—and recommend targeted improvements.

Required skills and experience:

5+ years of experience in data science, analytics, or a quantitative role with a focus on product, growth, or experimentation.
Deep experimentation expertise: A/B testing, experiment design, power analysis, handling skewed data, interpreting results beyond p-values.
Strong SQL skills and experience designing data models for high-volume event data; experience with dbt or similar transformation tools.
Proficiency in Python and data science libraries (pandas, scipy, statsmodels, etc.).
Ability to translate ambiguous questions into structured analysis and communicate findings clearly to both technical and non-technical stakeholders.
Bias toward action: you ship insights that influence decisions, not just dashboards.

Preferred Qualifications:

Experience with LLM or AI agent evaluation—understanding of prompt-response patterns, agent evaluation frameworks, or model quality measurement.
Background in high-growth SaaS or PLG companies with large-scale event data.
Experience with modern data stack (BigQuery, dbt, Fivetran, Segment, Hex).
Familiarity with experimentation platforms (LaunchDarkly, Statsig, Eppo, or similar).
Understanding of developer tools or software engineering workflows.

Bonus Points:

You've built agent or LLM evaluation frameworks from scratch.
Experience with causal inference methods (difference-in-differences, synthetic control, CUPED).
Familiarity with real-time data systems or operational analytics for monitoring agent performance.
Experience working with trace data, logging systems, or observability tooling.

This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday.

Full-Time Employee Benefits Include:

💰 Competitive Salary & Equity

💹 401(k) Program

⚕️ Health, Dental, Vision and Life Insurance

🩼 Short Term and Long Term Disability

🚼 Paid Parental, Medical, Caregiver Leave

🚗 Commuter Benefits

📱 Monthly Wellness Stipend

🧑‍💻 Autonoumous Work Environement

🖥 In Office Set-Up Reimbursement

🏝 Flexible Time Off (FTO) + Holidays

🚀 Quarterly Team Gatherings

☕ In Office Amenities

Want to learn more about what we are up to?

Interviewing + Culture at Replit

To achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.

Data Scientist, AI Agent

You will:

Examples of what you could do:

Required skills and experience:

Preferred Qualifications:

Bonus Points:

Meet JobCopilot: Your Personal AI Job Hunter