Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Design, build, and maintain scalable, highly available and fault-tolerant infra for web services and ML workloads.
Ensure platform, inference and model training environments are highly available and replicable across HPC clusters.
Operate production systems and troubleshoot issues (on-call, data extraction, admin tasks, scaling).
Implement and improve monitoring, alerting, and incident response to minimize downtime.
Build and maintain CI/CD, containerization, orchestration, logging and alerting for client APIs and large training runs.
Participate in on-call rotations to perform root‑cause analysis and prevent future incidents.

Master’s degree in Computer Science, Engineering or related field.
7+ years of DevOps/SRE experience.
Strong experience with cloud computing and highly available distributed systems.
Hands-on CI/CD, containerization and orchestration with Docker and Kubernetes.
Monitoring and observability tools: Prometheus, Grafana, Datadog, ELK Stack.
Infrastructure-as-code tools like Terraform or CloudFormation.

Site Reliability Engineer

Meet JobCopilot: Your Personal AI Job Hunter