YourRole

Areyoupassionateaboutobservabilityandresiliency?Isensuringweknowaboutissuesbeforeourcustomerssecondnaturetoyou?Isbeingatthefrontandorchestratingprocessessoundsfuntoyou?emnifyisseekingatalentedReliabilityEngineer&IncidentManagementOperatortodrivethecompanyIncidentManagementroutines,betheauthorityforeverythingobservabilityandresiliency,andguideinternalstakeholderswithbestpractices.

AsapartofthelargerEngineeringdepartment,ourPlatformteamplaysacrucialroleinenhancingourcompetitiveedgebyimprovingdeveloperexperiencetoincreasedevelopmentefficiencyandscaleproductivity.Youwilljoinateamof3engineers,fosteringempathyandacollaborationmindsettoensurecontinuousimprovementofdevelopmentexperienceatemnify.TheidealcandidatewillhaveextensiveexperiencewithAWScloudinfrastructure,microservices,andmodernobservabilitypracticesaswellasstrongcommunicationandorganizationalskills.

Thepositionis35%Incidentmanagementoperations,35%Observabilityandmonitoringwork,and30%platformengineeringanddevelopersupport.

Emnifytechnologyradar

Thepositionisbasedinemnify’sofficeinBerlin.

YourImpact:

Incidentmanagementoperations:

Leadandoptimizetheincidentmanagementprocessend-to-end,ensuringtimelydetection,resolution,anddocumentationofincidents;coordinatingcross-functionalteams,conductingpost-mortemsandrootcauseanalyses,anddrivingcontinuousimprovementstoworkflows.

Observabilityandmonitoring:

Design,implement,andcontinuouslyimproveobservabilityframeworksbydevelopingdashboards,alerts,metrics,andloggingstrategiestomonitorservicehealth,detectanomaliesproactively,supportissueresolution,andensurecost-optimizedperformanceacrosstheplatform.

CollaborationandSupport:

Partnerwithcross-functionalteamstoimplementobservabilitybestpractices,providingtrainingandguidanceontoolswhileleveragingmetricsdatatodriveengineeringpriorities.

Platformengineering:

LeverageAWStodesign,build,andmaintainaresilientcloudinfrastructure,implementingbestpracticesforsecurity,scalability,andcostoptimizationwhileensuringhighavailability,disasterrecovery,androbustplatformcomponentssuchaspipelines,sharedinfrastructure,andapplicationservices.

YourSkills:

•Provenexperienceasa(Site)ReliabilityEngineerorsimilarroleinaSaaSand/ortelecomcompany.

•Hands-onexperiencewithobservabilitytools(e.g.,Prometheus,Mimir,Grafana,Loki,CloudWatch,GrafanaIRM,Rootly),includingsetupandoptimizationofmetricsandalerts.

•Experienceinestablishingandmanagingincidentmanagementprocesses.

•Understandingofincidentmanagementframeworksandbestpractices.

•ExtensiveexperiencewithAWScloudservices(e.g.,EC2,S3,RDS,Lambda,CloudWatch).

•Expertskillswithmoderninfrastructuretoolingandprinciples(Kubernetes,IaaC-Terraform,CI/CD-GitHubActions,Jenkins)

•Goodunderstandingofmoderndevelopmenttoolingandprinciples(e.g.,microservicesarchitecture,12-factorapplications,Docker)

•Advanceddocumentationskillsforeffectiveknowledgesharingandcollaboration.

•Exceptionalproblem-solvingandcriticalthinkingwithapassionforenhancingdevelopmentexperiencesinfast-pacedtechenvironments.

•Abilitytoworkindependentlyandaspartofateam.

Nicetohave:

•Knowledgeofnetworkingprotocolsandtelecomsystems

•Knowledgeofsecuresoftwaredevelopment

•FamiliaritywithprogramminglanguagessuchasPython,Go,orJava.

•CertificationinAWS(e.g.,AWSCertifiedDevOpsEngineer,AWSCertifiedSolutionsArchitect)

EMnify

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Senior Site Reliability Engineer & Incident-Manager (m/f/d)

Meet JobCopilot: Your Personal AI Job Hunter

EMnify

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Senior Site Reliability Engineer & Incident-Manager (m/f/d)

Meet JobCopilot: Your Personal AI Job Hunter

Related Engineering Jobs