Site Reliability Engineer
New Today
Description
We are looking for an experienced Site Reliability Engineer to join the Igloo team in Cambridge to champion observability and delivery. The candidate should have strong communication skills, experience in coaching or sharing knowledge, and proficiency in Azure and Observability platforms.
Join Insurance Consulting and Technology (ICT) during a transformative period aimed at enhancing customer and business value. You'll be part of a high-performing team renowned for quality delivery, rapid development, and team spirit. We have won the InsuranceERM Best use of Cloud Technology award three years in a row.
Igloo is embarking on new and exciting uses of their technology. This role will have the opportunity to help the team and product deal with exciting, complex and large-scale client propositions where observability will be essential and help transform how the product is designed and deployed.
You will join a cross-team guild of Site Reliability Engineers, which enables you to not only influence direction within your product family, but to also help shape how we handle observability and monitoring across ICT.
This role is open to flexible and hybrid working arrangements, with presence in the Cambridge office a minimum of two days per week.
The Role:
- Collaborate with cross-functional teams to ensure the reliability, availability, and performance of our client-facing services
- Maintain and configure observability platforms such as Datadog
- Proactive monitoring of production and other environments to ensure stability, availability, security and integrity
- Design and implement automation and processes to improve the efficiency and effectiveness of the teams and other support functions
- Engage with business stakeholders to gather requirements, address concerns, and provide updates on projects and system status
- Contribute to the design, build and operational management of the services
- Lead incident response, troubleshooting, and root cause analysis to mitigate and prevent future issues
- Work closely with engineering, support and operations teams to upskill and promote knowledge transfer, producing training materials and articles
- Participate in on-call rotation to provide support and ensure system uptime
Qualifications
The Requirements:
- Experience as a Site Reliability Engineer or in a similar role (such as DevOps)
- Familiarity with managing cloud-based services (ideally Azure), including observability, monitoring, scaling, and security
- Hands-on use of observability tools such as Datadog (or similar)
- Knowledge of automation, scripting (Python or PowerShell), and Infrastructure as Code (e.g., Terraform, Pulumi, ARM Templates, or Bicep)
- Experience with Azure DevOps Pipelines (or similar)
- Strong interpersonal, verbal, and written communication skills
- Ability to coach, mentor, and share knowledge with others
- Experience collaborating with external clients and cross-functional teams
- Customer-focused, with strong problem-solving skills
Other highly desirable, but not essential skills are:
- Azure certifications, such as Azure Administrator, Azure Developer, or Azure DevOps Engineer
- Experience with containerization and orchestration (Docker, Kubernetes)
- Familiarity with programming languages such as C#
- Knowledge of Configuration as Code tools (e.g., Puppet, Ansible)
At WTW, we believe difference makes us stronger. We want our workforce to reflect the different and varied markets we operate in and to build a culture of inclusivity that makes colleagues feel welcome, valued and empowered to bring their whole selves to work every day. We are an equal opportunity employer committed to fostering an inclusive work environment throughout our organisation. We embrace all types of diversity.
((ICT_TECH TD_2025_47R))
]]>- Location:
- Cambridge
- Job Type:
- FullTime
- Category:
- Financial Services
We found some similar jobs based on your search
-
New Today
Site Reliability Engineer
-
Cambridge
- Financial Services
This job is with WTW, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Description We are looking for an experienced Site Reliability Engin...
More Details -
-
5 Days Old
Site Reliability Engineer Remote - Canada, Americas / Engineering
-
Cambridge
- Engineering
**Site Reliability Engineer**Remote - Canada, Americas / Engineering We offer The Tyk API Management platform is helping to drive the connected world and power new products and services. We’re changing the way that organisations connect any number...
More Details -
-
5 Days Old
Principal Site Reliability Engineer
-
Cambridge
- Engineering
Overview Join to apply for the Principal Site Reliability Engineer role at Playson . Founded in 2012, Playson is a leading iGaming supplier recognized worldwide. We provide our customers with a high-end micro-service-based platform as a service...
More Details -
-
28 Days Old
Lead Site Reliability Engineer
-
Cambridge
- IT & Technology
Lead Site Reliability Engineer Location: Remote working *1 day in every 2 weeks at our Stoke-On-Trent office (5 mins from station). Salary: £Competitive + company benefits (Full time/permanent role) About Click Dealer At Click Dealer, we’re pas...
More Details -