Related skills
rust docker python go reinforcement learningπ Description
- Design, build, and maintain sandboxed RL environments for agentic AI training.
- Develop reproducible, containerized execution environments for deterministic rollouts.
- Integrate with open-source agentic tooling and CLI/API harnesses.
- Build instrumentation and observability to capture training data.
- Collaborate with data ops to design curricula and evaluation protocols.
- Own environment deployment and reliability across CI/CD and versions.
π― Requirements
- 2+ years software engineering; strong Python and a systems language (Go, Rust, or C++)
- Experience with containerization/sandboxing (Docker/Podman/Firecracker) in prod
- Familiarity with RL concepts: MDPs, reward shaping, episode structure
- Experience building developer tooling, CLI tools, or infra automation
- Comfort with browser automation or terminal interaction tooling
- Ability to read academic/open-source RL benchmarks and implement from papers
π Benefits
- Location: SF Bay Area or Wroclaw, Poland; hybrid with 2 days in office
- Hybrid model with 2 days per week in office
- Fast-paced, high-intensity environment with ownership
- Growth opportunities tied to your impact
- Join Alignerr within Labelbox on data/ML infrastructure
- Vision: building foundational AI tech for humanity
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!