Senior Software Engineer, AI Model Serving - Kathmandu, Nepal
Speechify is looking for a Senior Software Engineer to design, build, and operate scalable AI model serving infrastructure for production-grade NLP/ML workloads. This on-site role is based in Kathmandu, Nepal, and you’ll collaborate with ML researchers and software engineers to deploy, optimize, and maintain model serving pipelines powering Speechify’s AI features.
About Speechify
Speechify is a leading AI-powered text-to-speech platform that helps people listen to content with ease. We are focused on delivering fast, reliable, and scalable systems that bring advanced AI capabilities to users worldwide.
Responsibilities
- Design, implement, and scale high-performance AI model serving systems for real-time and batch inference.
- Develop APIs and microservices to expose model predictions to frontend apps and external partners.
- Collaborate with ML scientists to optimize models for latency, throughput, and resource usage (quantization, distillation, etc.).
- Implement CI/CD pipelines for ML workflows, containerize services with Docker, and deploy to Kubernetes clusters.
- Monitor service reliability and observability (metrics, traces, logs) and establish robust alerting.
- Optimize resource usage across cloud and on-prem environments; implement batching, caching, and autoscaling strategies.
- Mentor junior engineers and contribute to architectural decisions across the AI platform.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 5+ years of software engineering experience with hands-on ML model serving.
- Proficiency in Python; experience with ML frameworks (TensorFlow, PyTorch) and model serving tools (TensorFlow Serving, TorchServe, Triton Inference Server).
- Experience with Kubernetes, Docker, and cloud platforms (AWS, Google Cloud Platform, Azure).
- Strong knowledge of APIs, distributed systems, and microservices architectures.
- Experience with monitoring/logging stacks (Prometheus, Grafana, OpenTelemetry).
- Excellent communication and collaboration skills; ability to work with cross-functional teams.
Nice-to-have
- Experience with large language models and GPU-accelerated workloads.
- Familiarity with CI/CD for ML, feature stores, or data pipelines.
We offer a competitive compensation package and a comprehensive benefits package, along with opportunities for growth.