Overview
Speechify is seeking a Senior Software Engineer, AI Model Serving to join our Manila team. You will design, implement, and scale AI model serving infrastructure to deliver low-latency inferences for production-grade models and features.
Responsibilities
- Build and maintain scalable model-serving pipelines for ML inference using Python and modern APIs (e.g., FastAPI, Flask).
- Design APIs for model discovery, versioning, deployment, and experimentation workflows.
- Optimize latency, throughput, and memory usage; perform profiling; manage GPU utilization for efficient inference.
- Deploy and operate services on Kubernetes with Docker, establish CI/CD pipelines, and ensure reliability and observability.
- Implement monitoring, logging, tracing, and alerting (e.g., Prometheus, Grafana, Sentry) to maintain service health.
- Collaborate with ML engineers to deploy model updates and build tooling for model monitoring and governance.
- Write clean, well-documented code, participate in code reviews, and contribute to architectural decisions.
- Mentor junior engineers and help shape the platform for scale and security.
Requirements
- 5+ years of software engineering experience with a strong track record in production systems.
- Hands-on experience deploying ML models in production and model-serving frameworks (e.g., TorchServe, TensorFlow Serving, ONNX Runtime, Triton Inference Server).
- Proficiency in Python and solid software design skills.
- Experience with cloud platforms (AWS, GCP, Azure).
- Experience with containerization (Docker) and orchestration (Kubernetes).
- Familiarity with CI/CD pipelines and testing best practices.
- Knowledge of monitoring/observability tools (Prometheus, Grafana, Sentry).
- Nice to have: experience with GPUs, ML workflows, large language models, and languages like Rust or Go.
About Speechify
Speechify is a fast-growing company delivering AI-powered reading and text-to-speech solutions. This role is based in Manila, Philippines and offers a collaborative environment focused on building scalable, reliable AI infrastructure.