Related skills
python distributed systems ci/cd mlops llmsπ Description
- Design scalable, low-latency model-serving infra for LLMs.
- Build APIs and services for real-time conversational workloads.
- Optimize inference for throughput, latency, and cost.
- Architect end-to-end ML pipelines from training to deployment.
- Lead architecture decisions for production AI at scale.
π― Requirements
- 1-4 years in ML engineering, backend systems, or distributed infra.
- Proven experience deploying ML models in production.
- Strong Python and/or C++ programming skills.
- Experience with large-scale model serving (LLMs, transformers).
- Deep understanding of distributed systems, API design, and cloud infra.
- Experience with MLOps tools and CI/CD, monitoring, and experiment tracking.
π Benefits
- Diverse medical, dental and vision options.
- 401(k) matching program.
- Unlimited paid time off.
- Parental leave and flexibility for all parents and caregivers.
- Support of country-specific visa needs for international employees in the Bay Area.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!