Related skills
speech pytorch audio automatic_speech_recognition neural_audio_codecsπ Description
- Research, develop, and optimize voice/audio neural models (TTS/ASR).
- Build prod training/inference pipelines for voice models, focusing on latency.
- Run end-to-end experiments: data, design, training, evaluation, ablations.
- Collaborate with ML, product, and infra to deploy voice models in Pi.
- Explore neural audio codecs, diffusion synthesis, streaming, multimodal models.
- Develop evaluation frameworks with perceptual metrics and benchmarks.
- Contribute to Inflection's research culture via publications and reviews.
π― Requirements
- 2-5 years in audio, speech, or multimodal ML (research or engineering).
- Strong PyTorch proficiency; experience training/debugging large-scale models on GPUs.
- Solid understanding of audio/speech: spectrograms, mel, vocoders.
- Able to take ideas from prototype to production; CUDA-aware training loops.
- Familiar with diffusion, autoregressive codecs, flow-matching for audio.
- Clear, collaborative communication for cross-functional teams.
- BS/BA in CS/EE/Linguistics or related; MS/PhD preferred.
π Benefits
- Medical, dental, and vision coverage.
- 401k matching.
- Unlimited PTO.
- Parental leave and caregiver flexibility.
- Visa support for international Bay Area employees.
π Visa sponsorship
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!