Related skills
python tensorflow pytorch cuda nsight compute๐ Description
- Analyze and optimize DL networks running on the AV.
- Optimize software architecture, performance, and latency for DL apps.
- Deploy DL models on the AV and train in large-scale data centers.
- Troubleshoot performance with profiling and roofline techniques.
- Collaborate with cross-functional teams to improve self-driving tech.
- Learn quickly and adapt to new DL tech with strong communication.
๐ฏ Requirements
- 5+ years of software engineering experience.
- BS/MS/PhD in CS or related field.
- Strong CUDA, C++, and Python programming.
- Experience in HPC/parallel programming; optimize GPU memory, latency, throughput.
- Proficiency with NVIDIA Nsight Systems and Nsight Compute; roofline model.
- Hands-on DL/ML optimization at framework level (PyTorch/TensorFlow).
- Strong CV and transformer DL architectures; foundational NN blocks.
- Strong analytical skills to diagnose performance bottlenecks.
- Ability to learn quickly and adapt in a fast-paced environment.
- Experience with large codebases in fast-growing environments.
- Strong communication for cross-functional teamwork.
- Comfortable in Linux/Unix environments.
- Motion planning, robotics, or related fields (desirable).
- Experience with TensorRT, OpenAI Triton, Mojo (desirable).
๐ Benefits
- Hybrid work environment with in-office 3 days per week.
- Annual bonus, equity, and benefits.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!