Staff Software Engineer - AI Research Infrastructure

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

rust java kubernetes go scala

๐Ÿ“‹ Description

  • Design infra for large-scale experiments and training (HPC, GPU cloud)
  • Build job submission, scheduling, and monitoring abstractions
  • Create tooling to boost research productivity and reduce iteration time
  • Shape long-term infra roadmap for Databricks AI Research
  • Mentor engineers on compute, infra, and AI systems

๐ŸŽฏ Requirements

  • BS/MS or PhD in Computer Science or related field
  • 5+ years of software engineering, with large-scale distributed systems or infra
  • Deep experience building/operating distributed systems, data pipelines, or backend services with GPUs/clusters
  • Proficient in C++, Rust, Go, Java, or Scala
  • Built or contributed to cluster schedulers or large-scale job orchestration (Kubernetes, Slurm, Ray)
  • Understanding of modern ML training/inference workflows
  • Fast, pragmatic delivery with focus on reliability and ops
  • Strong communication between researchers and engineers

๐ŸŽ Benefits

  • Comprehensive benefits and perks
  • Region-specific details via link
  • Commitment to diversity and inclusion
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’