Job Description
Speechify is seeking a Senior Software Engineer to join our AI Model Serving team in Edinburgh, United Kingdom. You will design and implement scalable, low-latency inference services to deploy and monitor machine learning models in production, collaborating with ML engineers, data scientists, and product teams.
Responsibilities
- Design, implement, and scale AI model serving infrastructure and APIs for production environments.
- Build low-latency inference services and optimize model performance for end users.
- Integrate with model registries, monitoring, logging, and observability systems.
- Collaborate with ML, data, and product teams to deliver features end-to-end.
- Mentor teammates and contribute to architectural decisions and best practices.
- Maintain security, reliability, and compliance in production systems.
Requirements
- 5+ years of software engineering experience with Python and/or C++.
- Strong background in ML model deployment and serving frameworks (TensorFlow, PyTorch, TorchServe, or equivalent).
- Experience with Kubernetes, Docker, and cloud platforms (AWS, GCP, or Azure).
- Proficiency in building distributed systems, APIs, and microservices.
- Familiarity with monitoring/observability tools (Prometheus, Grafana, etc.).
- BS or MS in Computer Science or equivalent practical experience.
Nice to have
- Experience with edge/latency-sensitive inference and serialization formats (ONNX, TorchScript).
- Knowledge of CI/CD pipelines and software reliability practices.
Location: Edinburgh, United Kingdom (On-site)