Lead Site Reliability Engineer

6 Days Old

Job Description

MY client are transforming observability with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency.


They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform. You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth alert management, fewer engineering interruptions, and a best-in-class incident response experience.


This role blends technical depth, customer impact, and product strategy — perfect for someone who thrives at the intersection of engineering, incident response, and product innovation.


What You’ll Do

  • Champion customer experience by speeding up alert resolution and reducing interruptions for engineers.
  • Build solutions to common pain points, shaping roadmaps, documentation, and technical knowledge.
  • Develop benchmarking tools to improve performance, reliability, and scalability.
  • Stay ahead of incident management trends to drive new workflows and product improvements.
  • Mentor teams and lead with clear, impactful communication.


What We’re Looking For

  • 5+ years in software engineering, DevTools, or infrastructure.
  • Strong expertise in incident management, alert routing, and large-scale orchestration.
  • SaaS or incident management platform experience (PagerDuty, OpsGenie, etc. a plus).
  • Solid technical foundation with cloud/distributed systems.
  • Excellent communicator, comfortable working across US/IL time zones.
  • Bonus: leadership experience, SRE/DevOps background, knowledge of SLO/SLA practices.

Location:
City Of London
Category:
Technology

We found some similar jobs based on your search