Related skills
jenkins github actions aws kubernetes eksYourRole
Areyoupassionateaboutobservabilityandresiliency?Isensuringweknowaboutissuesbeforeourcustomerssecondnaturetoyou?Isbeingatthefrontandorchestratingprocessessoundsfuntoyou?emnifyisseekingatalentedReliabilityEngineer&IncidentManagementOperatortodrivethecompanyIncidentManagementroutines,betheauthorityforeverythingobservabilityandresiliency,andguideinternalstakeholderswithbestpractices.
AsapartofthelargerEngineeringdepartment,ourPlatformteamplaysacrucialroleinenhancingourcompetitiveedgebyimprovingdeveloperexperiencetoincreasedevelopmentefficiencyandscaleproductivity.Youwilljoinateamof3engineers,fosteringempathyandacollaborationmindsettoensurecontinuousimprovementofdevelopmentexperienceatemnify.TheidealcandidatewillhaveextensiveexperiencewithAWScloudinfrastructure,microservices,andmodernobservabilitypracticesaswellasstrongcommunicationandorganizationalskills.
Thepositionis35%Incidentmanagementoperations,35%Observabilityandmonitoringwork,and30%platformengineeringanddevelopersupport.
Emnifytechnologyradar
The position is based in emnify’s office in Berlin.
YourImpact:
- Incidentmanagementoperations:
Leadandoptimizetheincidentmanagementprocessend-to-end,ensuringtimelydetection,resolution,anddocumentationofincidents;coordinatingcross-functionalteams,conductingpost-mortemsandrootcauseanalyses,anddrivingcontinuousimprovementstoworkflows.
- Observabilityandmonitoring:
Design,implement,andcontinuouslyimproveobservabilityframeworksbydevelopingdashboards,alerts,metrics,andloggingstrategiestomonitorservicehealth,detectanomaliesproactively,supportissueresolution,andensurecost-optimizedperformanceacrosstheplatform.
- CollaborationandSupport:
Partnerwithcross-functionalteamstoimplementobservabilitybestpractices,providingtrainingandguidanceontoolswhileleveragingmetricsdatatodriveengineeringpriorities.
- Platformengineering:
LeverageAWStodesign,build,andmaintainaresilientcloudinfrastructure,implementingbestpracticesforsecurity,scalability,andcostoptimizationwhileensuringhighavailability,disasterrecovery,androbustplatformcomponentssuchaspipelines,sharedinfrastructure,andapplicationservices.
YourSkills:
•Provenexperienceasa(Site)ReliabilityEngineerorsimilarroleinaSaaSand/ortelecomcompany.
•Hands-onexperiencewithobservabilitytools(e.g.,Prometheus,Mimir,Grafana,Loki,CloudWatch,GrafanaIRM,Rootly),includingsetupandoptimizationofmetricsandalerts.
•Experienceinestablishingandmanagingincidentmanagementprocesses.
•Understandingofincidentmanagementframeworksandbestpractices.
•ExtensiveexperiencewithAWScloudservices(e.g.,EC2,S3,RDS,Lambda,CloudWatch).
•Expertskillswithmoderninfrastructuretoolingandprinciples(Kubernetes,IaaC-Terraform,CI/CD-GitHubActions,Jenkins)
•Goodunderstandingofmoderndevelopmenttoolingandprinciples(e.g.,microservicesarchitecture,12-factorapplications,Docker)
•Advanceddocumentationskillsforeffectiveknowledgesharingandcollaboration.
•Exceptionalproblem-solvingandcriticalthinkingwithapassionforenhancingdevelopmentexperiencesinfast-pacedtechenvironments.
•Abilitytoworkindependentlyandaspartofateam.
Nicetohave:
•Knowledgeofnetworkingprotocolsandtelecomsystems
•Knowledgeofsecuresoftwaredevelopment
•FamiliaritywithprogramminglanguagessuchasPython,Go,orJava.
•CertificationinAWS(e.g.,AWSCertifiedDevOpsEngineer,AWSCertifiedSolutionsArchitect)
Meet JobCopilot: Your Personal AI Job Hunter
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!