Site Reliability Engineer

New Yesterday

We're hiring several Senior Site Reliability Engineers to help shape a Centre of Excellence for SRE practices across a global tech estate. This is a high-impact, hands-on role where you'll engineer automation frameworks, elevate observability, and transform incident response at scale.You’ll be the go-to expert guiding strategy, influencing culture, and driving adoption of SRE principles across diverse teams. From scripting to architecting resilient systems, your technical leadership will directly improve performance, scalability, and availability.What you’ll do:System Reliability & Performance: Ensure high availability, optimal performance, and scalability of services through proactive monitoring, maintenance, and capacity planning.Incident Response & Prevention: Lead resolution and analysis of system outages. Implement preventative measures to reduce recurrence and improve system resilience.Automation & Tooling: Develop scripts in Python or Go and tools to automate operational processes, reduce manual effort, and enhance efficiency.Performance Optimization: Monitor system metrics, identify bottlenecks, and apply best practices for performance tuning and resource utilization.Cross-Team Collaboration: Partner with development and infrastructure teams to embed reliability and scalability into the software development lifecycle.Seniority levelMid-Senior levelEmployment typeFull-timeJob functionInformation TechnologyIndustries: Technology, Information and Media and Financial ServicesThis range is provided by Caspian One. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay range #J-18808-Ljbffr
Location:
Glasgow, Scotland, United Kingdom
Job Type:
FullTime

We found some similar jobs based on your search