Research Compute Operations

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

dashboards platform engineering ml infrastructure self-service claude

πŸ“‹ Description

  • Primary contact for researchers using internal compute infra; triage access issues.
  • Proactively monitor usage and help researchers optimize workloads.
  • Help design the roadmap for research inference tooling; gather feedback and drive progress.
  • Prototype dashboards, automations, and self-service workflows.
  • Build automations (using Claude) for common operational workflows.

🎯 Requirements

  • Engineering background or depth; transition to product, technical operations, or systems design.
  • Query data, understand infra, debug, and prototype tools quickly.
  • Systems-thinker; analyze root causes and prevent recurrence.
  • Comfort navigating ambiguity across teams and balancing tactical vs strategic work.
  • Daily use of Claude or AI tools and willingness to share best practices.
  • Understanding compute infra; rate limiting, autoscaling, and prioritization.
  • ML infrastructure/ML engineering/research engineering background.
  • Experience with large-scale accelerator clusters (TPUs/GPUs).
  • Familiarity with ML training pipelines and inference capacity.
  • Track record building internal tools or developer platforms.
  • Experience in DevEx or platform engineering.

🎁 Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • Lovely office space.

πŸ›ƒ Visa sponsorship

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Operations Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Operations Jobs

See more Operations jobs β†’