This job is no longer available

The job listing you are looking has expired.
Please browse our latest remote jobs.

See open jobs →
← Back to all jobs

Site Reliability Engineer


2 months ago
Not Specified

FARFETCH exists for the love of fashion. Our mission is to be the global platform for luxury fashion, connecting creators, curators and consumers.

We're a positive platform for good, bringing together an incredible creative community made up by our people, our partners and our customers. This community is at the heart of our business success. We welcome differences, empower individuality and celebrate diverse skills and perspectives, creating an inclusive environment for everyone. We are FARFETCH for All.


We're on a mission to build the technology that powers the global platform for luxury fashion. We operate a modular end-to-end technology platform purpose-built to connect the luxury fashion ecosystem worldwide, addressing complex challenges and enjoying it. We're empowered to break traditions and revolutionise, with the freedom and autonomy to make a difference for our customers all over the world.


Our Porto office is located in Portugal's vibrant second city, known for its history and its creative yet cosy environment. From Account Management to Technology and Product, whatever your skills are, you'll find your fit here. You can have an informal meeting in the treehouse or play the piano in your lunch break!


At Farfetch, the Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and performance of the company's website and applications. This role involves close collaboration with both the development and operations teams to build and maintain a scalable and robust infrastructure that supports Farfetch's business objectives. As a Site Reliability Engineer, you will be part of a team that serves as a bridge between our department and the Infrastructure department. SREs in this position have the autonomy to explore and promote reliability best practices across the organization, acting as consulting partners for all tech-related areas.


  • Design and implement highly available and scalable systems, ensuring the reliability and performance of the company's website or application
  • Collaborate with cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems
  • Monitor systems and applications, proactively identifying and resolving any performance bottlenecks or availability issues
  • Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance
  • Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents
  • Automate repetitive tasks and processes to improve efficiency and reduce manual intervention
  • Create and maintain documentation for system architecture, configuration, and troubleshooting procedures
  • Perform capacity planning and resource allocation to ensure optimal system performance and scalability
  • Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability and performance standards
  • Stay up to date with industry best practices, new technologies, and emerging trends in site reliability engineering.

  • General knowledge of operating systems (Linux and Windows)
  • Experienced in designing, analyzing, and troubleshooting large-scale distributed systems
  • Experienced programming in at least one of the following languages: C#, Java, or Python. Other scripting languages are also a plus
  • Experience with configuration management tools like Ansible, Puppet, Chef or Salt (preferably Salt)
  • Familiarity with cloud platforms like AWS, Azure, or Google Cloud (preferably Azure)
  • Understanding of basic networking principles and protocols (TCP/IP, HTTP, DNS, etc.)
  • Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools
  • Expertise in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk
  • Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues
  • Excellent communication and collaboration skills to work effectively with cross-functional teams. (You have to speak English)
  • Strong attention to detail and ability to work in a fast-paced, dynamic environment
  • Solid understanding of software development methodologies and DevOps principles
  • Certification in relevant technologies or frameworks is a plus (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator)
  • Familiarity with continuous integration/continuous deployment (CI/CD) pipelines
  • Experience with source control systems such as Git or SVN
  • Experienced in identifying and addressing toil.
  • Additional Information

    At Farfetch, the Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and performance of the company's website and applications.

    Share job

    Help us maintain the quality of jobs posted on Empllo!

    Is this position not a remote job?

    Let us know!
    Similar Engineering Jobs
    See more Engineering jobs →
    Zscaler logo
    🌏 Asia
    💰 Salary not provided
    Swissborg logo
    Restricted Remote
    💰 Salary not provided
    Zeta logo
    💰 Salary not provided
    Customerio logo
    Restricted Remote
    🇺🇸 United States
    💰 $180K - $190K