Cohere

51-200 employees
21 jobs posted

View company profile →

Please mention that you found this job on empllo.com. Thanks & good luck!

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills
Tailors your resume and cover letter automatically
Works 24/7—so you don't have to

Activate JobCopilot

Follow us on LinkedIn!

Staff Software Engineer, GPU Infrastructure (HPC)

Added

4 days ago

Location

🇨🇦 Canada

Type

Full time

Salary

Salary not provided

Related skills

python kubernetes pytorch jax gpu

📋 Description

Build and scale ML HPC infra: Kubernetes GPU/TPU clusters across clouds.
Optimize AI/ML training: cost, reliability, performance; RDMA/NCCL/interconnects.
Troubleshoot bottlenecks and failures to minimize disruption.
Enable researchers with self-service tools to monitor, debug, and optimize training jobs.
Drive innovation in ML infrastructure with JAX, PyTorch, and distributed training.
On-call rotation (24x7) with compensation.

🎯 Requirements

Deep ML/HPC infra expertise: GPU/TPU clusters and distributed training.
Kubernetes at scale: deploy/manage cloud-native clusters for AI workloads.
Strong programming: Python for ML tooling and Go for systems; open-source favored.
Linux internals, RDMA networking, HPC performance tuning.
Research collaboration experience with AI researchers/ML engineers.
Self-directed problem solving and driving impact in fast-paced environments.

🎁 Benefits

Open and inclusive culture and work environment.
Work with cutting-edge AI research.
Weekly lunch stipend, in-office lunches and snacks.
Health and dental benefits, mental health budget.
Parental leave top-up up to 6 months.
Remote-flexible with offices in multiple cities and coworking stipend; 6 weeks vacation.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot