Lambda

51-200 employees
18 jobs posted

View company profile →

Please mention that you found this job on empllo.com. Thanks & good luck!

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills
Tailors your resume and cover letter automatically
Works 24/7—so you don't have to

Activate JobCopilot

Follow us on LinkedIn!

HPC Operations Engineer

Added

11 hours ago

Location

🌍 North America

Type

Full time

Salary

Upgrade to Premium to se...

Related skills

linux kubernetes pytorch slurm infiniband

📋 Description

Remotely deploy and configure large-scale HPC clusters for AI workloads
Remotely install and configure OS, firmware, software, and networking on HPC clusters
Troubleshoot HPC cluster issues with on-site deployment teams
Provide clear and detailed requirements to other teams for gaps and improvements
Contribute to creation and maintenance of Standard Operating Procedures
Mentor and assist less experienced team members

🎯 Requirements

5+ years deploying and configuring HPC clusters for AI workloads
Deep understanding of HPC/AI architecture, OS, firmware, software, and networking
Expertise with SFP+ fiber, Infiniband, and 100 GbE networks
Experience with Ethernet, switching, GPU direct, RDMA, NCCL, Horovod
Linux-based compute nodes, firmware updates, driver installation
SLURM, Kubernetes, or other job scheduling systems

🎁 Benefits

Generous cash and equity compensation
Health, dental, and vision coverage
Wellness and commuter stipends
401k Plan with 2% company match (USA)
Flexible paid time off plan

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot