Lead Site Reliability Engineer

New Today

Remember to check your CV before applying Also, ensure you read through all the requirements related to this role. MY client are transforming observability with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency. They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform . You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth alert management, fewer engineering interruptions, and a best-in-class incident response experience. This role blends technical depth, customer impact, and product strategy — perfect for someone who thrives at the intersection of engineering, incident response, and product innovation. What You’ll Do Champion customer experience by speeding up alert resolution and reducing interruptions for engineers. Build solutions to common pain points, shaping roadmaps, documentation, and technical knowledge. Develop benchmarking tools to improve performance, reliability, and scalability. Stay ahead of incident management trends to drive new workflows and product improvements. Mentor teams and lead with clear, impactful communication. What We’re Looking For 5+ years in software engineering, DevTools, or infrastructure. Strong expertise in incident management, alert routing, and large-scale orchestration. SaaS or incident management platform experience (PagerDuty, OpsGenie, etc. a plus). Solid technical foundation with cloud/distributed systems. Excellent communicator, comfortable working across US/IL time zones. Bonus: leadership experience, SRE/DevOps background, knowledge of SLO/SLA practices.
#J-18808-Ljbffr
Location:
London, England
Category:
Engineering

We found some similar jobs based on your search