Site Reliability Engineer - CO - G7
New Today
Overview
As a DevOps Engineer on the GovWifi service, you will be part of a cross-disciplinary team responsible for ensuring the secure, reliable, and efficient operation of a government-critical platform. Your work will directly support thousands of users across the UK public sector, helping create a seamless and secure WiFi experience in government buildings nationwide.
Responsibilities
Maintaining service reliability: Monitor, manage and improve the availability of GovWifi, ensuring the platform consistently meets service level objectives. Respond to and resolve incidents quickly, serving as a point of escalation when needed.
Automating infrastructure: Use Terraform (or other IaC tools) to automate deployments and infrastructure changes, reducing manual intervention and improving consistency.
Deploying securely: Carry out safe, reliable deployments of code and configuration into AWS environments (ECS, EC2, CloudWatch, ELB, CodeBuild, CodePipeline).
Improving system resilience: Design, build and implement monitoring, alerting, and recovery mechanisms to keep systems highly available and secure.
Mitigating risks: Identify, assess, and reduce security vulnerabilities across the platform, applying web security best practices and implementing protective measures.
Supporting migrations and transitions: Assist with tool changes, platform improvements, or policy-driven migrations that affect GovWifi operations.
Building for users: Develop new features or improvements through prototyping, proof-of-concepts, and continuous iteration in collaboration with product managers and developers.
Knowledge sharing: Document technical decisions clearly, add to the team’s knowledge base, and explain complex issues to non-technical colleagues in a clear, supportive way.
Customer support: Engage with end-user requests and issues through support tools such as Zendesk, helping resolve technical challenges directly impacting users.
Driving continuous improvement: Pair with teammates, contribute to engineering improvement initiatives, and promote best practices across the service.
Ways of working
You’ll spend your time collaborating closely with site reliability engineers, developers, product managers, and central teams. You’ll work independently when needed, but also in pairs and group settings to solve problems. You’ll play an active role in incident reviews, retrospectives, and roadmap planning. The role requires curiosity, adaptability, and a commitment to secure, user-centred service delivery.
- Location:
- United Kingdom
- Job Type:
- FullTime
- Category:
- Engineering