Job Overview
\n
Speechify is seeking an AI Engineer & Researcher, Inference to join our San Francisco, USA team. This role blends research and engineering to build and optimize high-performance inference systems for speech models. You will collaborate with the ML research and product teams to push the state of the art in model latency, memory efficiency, and reliable deployment to production. The ideal candidate is proficient in modern ML frameworks and fluent in turning research insights into production-ready inference tooling.
\n
Responsibilities
\n
\n- Design, implement, and optimize scalable inference pipelines for speech models.
\n- Conduct applied research on efficient inference techniques (e.g., quantization, pruning, distillation) and integrate them into production systems.
\n- Evaluate models for latency, accuracy, and memory usage; develop benchmarks and monitor performance in production.
\n- Collaborate with ML researchers and product teams to deploy reliable, real-time speech solutions.
\n- Build tooling to automate experiments, track results, and support model lifecycle management.
\n
\n
Requirements
\n
\n- Strong background in machine learning and systems, with a focus on inference for deep models.
\n- Proficiency in Python and at least one ML framework (PyTorch or TensorFlow).
\n- Experience with C++/CUDA and performance-oriented programming is a plus.
\n- Familiarity with distributed systems, GPUs, and cloud-based deployments.
\n- Excellent collaboration skills and ability to translate research ideas into production-ready solutions.
\n
\n
Nice to have
\n
\n- Experience in speech recognition, speech synthesis, or related audio processing domains.
\n- Publications or open-source contributions in ML research topics.
\n- Experience with quantization, pruning, distillation, or other model compression techniques.
\n