Staff Software Engineer - AI Research Infrastructure

Added
3 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

cloud kubernetes distributed systems job scheduling hpc

📋 Description

  • Design and run research infra for large-scale experiments, GPUs.
  • Build services to schedule, orchestrate, and observe workloads.
  • Improve tooling to boost researcher productivity and monitoring.
  • Influence roadmaps for research compute, training, and delivery.
  • Mentor engineers on compute, infra, and AI systems.
  • Partner with researchers, ML engineers, and platform teams.

🎯 Requirements

  • BS/MS or PhD in Computer Science or related field.
  • 5+ years in software engineering with large-scale distributed systems.
  • Deep experience building/operating distributed systems and data pipelines.
  • Proficient in one or more systems languages (C++, Rust, Go, Java, Scala).
  • Experience with cluster schedulers or large-scale job orchestration (Kubernetes, Slurm, Ray).
  • Understand ML training and inference workflows (distributed training, eval).

🎁 Benefits

  • Comprehensive benefits and region-specific details online.
  • Inclusive culture with a commitment to diversity.
  • Work on cutting-edge AI research infrastructure.
  • Global teams and offices with SF presence.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →