Senior Site Reliability engineer (SRE)
New Today
Overview
Role- Senior Site Reliability Engineer (SRE)
Location - London (onsite full-time, 5 days a week)
Salary - Perm up to 80K gross
Minimum requirement: 12+ years of profile
PFB updated JD
Core Competencies / Responsibilities
- Datadog, Splunk, Dynatrace, Grafana, Prometheus, Thousand Eyes, Gremlin, etc.
- Efficiency in creating dashboards for Infra / APM / E2E workflows.
- Monitoring, logging, alerting and error budgets (SLA metrics: 99.9, 99.99, 99.999%) for software, operations and business.
- Define SLO, SLI, SLA with business/operations/engineering teams.
- Automation / auto-healing – Python, shell scripting, Java scripts. Developing custom services – monitoring.
- Experience with logging, monitoring, and event detection on cloud or distributed platforms.
- ITIL – Incident/Change, proficient in problem management – blameless postmortems, findings, applying permanent fixes, documentation for lessons learned.
- Technical operations: application support, stability, reliability and resiliency experience.
- DevOps, Ansible, Terraform, Docker, AWS (Atlas), Jenkins CI/CD pipelines.
- Unix/Linux, Windows Server, Oracle, MSSQL, MongoDB.
- Location:
- London
- Job Type:
- FullTime
- Category:
- Engineering