Lead Site Reliability Engineer

New Yesterday

Overview MY client are transforming observability with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency. This role blends technical depth, customer impact, and product strategy — perfect for someone who thrives at the intersection of engineering, incident response, and product innovation. You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth alert management, fewer engineering interruptions, and a best-in-class incident response experience. Responsibilities
Champion customer experience by speeding up alert resolution and reducing interruptions for engineers. Build solutions to common pain points, shaping roadmaps, documentation, and technical knowledge. Develop benchmarking tools to improve performance, reliability, and scalability. Stay ahead of incident management trends to drive new workflows and product improvements. Mentor teams and lead with clear, impactful communication.
Qualifications
5+ years in software engineering, DevTools, or infrastructure. Strong expertise in incident management, alert routing, and large-scale orchestration. SaaS or incident management platform experience (PagerDuty, OpsGenie, etc. a plus). Solid technical foundation with cloud/distributed systems. Excellent communicator, comfortable working across US/IL time zones. Bonus: leadership experience, SRE/DevOps background, knowledge of SLO/SLA practices.
#J-18808-Ljbffr
Location:
City Of London, England, United Kingdom
Job Type:
FullTime

We found some similar jobs based on your search