For job seekers
For companies
Set your preferences and let your AI copilot handle the job search while you sleep.
YourRole
Areyoupassionateaboutobservabilityandresiliency?Isensuringweknowaboutissuesbeforeourcustomerssecondnaturetoyou?Isbeingatthefrontandorchestratingprocessessoundsfuntoyou?emnifyisseekingatalentedReliabilityEngineer&IncidentManagementOperatortodrivethecompanyIncidentManagementroutines,betheauthorityforeverythingobservabilityandresiliency,andguideinternalstakeholderswithbestpractices.
AsapartofthelargerEngineeringdepartment,ourPlatformteamplaysacrucialroleinenhancingourcompetitiveedgebyimprovingdeveloperexperiencetoincreasedevelopmentefficiencyandscaleproductivity.Youwilljoinateamof3engineers,fosteringempathyandacollaborationmindsettoensurecontinuousimprovementofdevelopmentexperienceatemnify.TheidealcandidatewillhaveextensiveexperiencewithAWScloudinfrastructure,microservices,andmodernobservabilitypracticesaswellasstrongcommunicationandorganizationalskills.
Thepositionis35%Incidentmanagementoperations,35%Observabilityandmonitoringwork,and30%platformengineeringanddevelopersupport.
Emnifytechnologyradar
Thepositionisbasedinemnify’sofficeinBerlin.
YourImpact:
Leadandoptimizetheincidentmanagementprocessend-to-end,ensuringtimelydetection,resolution,anddocumentationofincidents;coordinatingcross-functionalteams,conductingpost-mortemsandrootcauseanalyses,anddrivingcontinuousimprovementstoworkflows.
Design,implement,andcontinuouslyimproveobservabilityframeworksbydevelopingdashboards,alerts,metrics,andloggingstrategiestomonitorservicehealth,detectanomaliesproactively,supportissueresolution,andensurecost-optimizedperformanceacrosstheplatform.
Partnerwithcross-functionalteamstoimplementobservabilitybestpractices,providingtrainingandguidanceontoolswhileleveragingmetricsdatatodriveengineeringpriorities.
LeverageAWStodesign,build,andmaintainaresilientcloudinfrastructure,implementingbestpracticesforsecurity,scalability,andcostoptimizationwhileensuringhighavailability,disasterrecovery,androbustplatformcomponentssuchaspipelines,sharedinfrastructure,andapplicationservices.
YourSkills:
•Provenexperienceasa(Site)ReliabilityEngineerorsimilarroleinaSaaSand/ortelecomcompany.
•Hands-onexperiencewithobservabilitytools(e.g.,Prometheus,Mimir,Grafana,Loki,CloudWatch,GrafanaIRM,Rootly),includingsetupandoptimizationofmetricsandalerts.
•Experienceinestablishingandmanagingincidentmanagementprocesses.
•Understandingofincidentmanagementframeworksandbestpractices.
•ExtensiveexperiencewithAWScloudservices(e.g.,EC2,S3,RDS,Lambda,CloudWatch).
•Expertskillswithmoderninfrastructuretoolingandprinciples(Kubernetes,IaaC-Terraform,CI/CD-GitHubActions,Jenkins)
•Goodunderstandingofmoderndevelopmenttoolingandprinciples(e.g.,microservicesarchitecture,12-factorapplications,Docker)
•Advanceddocumentationskillsforeffectiveknowledgesharingandcollaboration.
•Exceptionalproblem-solvingandcriticalthinkingwithapassionforenhancingdevelopmentexperiencesinfast-pacedtechenvironments.
•Abilitytoworkindependentlyandaspartofateam.
Nicetohave:
•Knowledgeofnetworkingprotocolsandtelecomsystems
•Knowledgeofsecuresoftwaredevelopment
•FamiliaritywithprogramminglanguagessuchasPython,Go,orJava.
•CertificationinAWS(e.g.,AWSCertifiedDevOpsEngineer,AWSCertifiedSolutionsArchitect)
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!