Senior Site Reliability Engineer - Midnight
3 Days Old
Senior Site Reliability Engineer - Midnight
Greater London, England, United Kingdom
5 days ago Be among the first 25 applicants
Get AI-powered advice on this job and more exclusive features.
About IOG and Midnight
IOG is a technology company focused on blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizing peer‑reviewed research and formal methods to ensure security, scalability, and sustainability. Our projects include DeFi, governance, and identity management, aiming to advance the capabilities and adoption of blockchain technology globally. Midnight is a core contributor to the Midnight Network, a blockchain platform that safeguards personal and business data using programmable data isolation and zero‑knowledge proofs to enable selective disclosure. We are committed to curiosity, innovation, and positive change across our teams.
What the Role Involves
As a Senior SRE, you will help shape the reliability and performance of our systems across our cloud infrastructure. You will design and implement solutions that improve service reliability, automate routine tasks, and facilitate collaboration between development and operations teams. This role requires deep technical expertise, a proactive mindset, and the ability to turn evolving challenges into robust, workable solutions.
Responsibilities
- Infrastructure & Automation: Design, build, and maintain scalable and highly available systems, primarily on AWS, using best practices.
- Manage and optimize Kubernetes clusters for high availability and performance, expanding them when beneficial.
- Leverage GitOps to automate deployments and manage container orchestration.
- Implement and manage CI/CD pipelines to ensure seamless, high‑quality deployments; identify bottlenecks and improve feedback loops; automate toil.
- Develop automation tools and scripts to improve operational efficiency.
- Monitoring & Incident Response: Implement robust monitoring with Prometheus and related tooling to ensure system health and performance.
- Participate in on‑call rotations and lead incident response efforts, turning challenges into learning opportunities.
- Collaborate with development teams to define and implement SLOs/SLIs.
- Problem Solving & Communication: Take vague problems and distill them into clear, actionable plans; communicate solutions and retrospectives effectively to technical and non‑technical stakeholders.
- Innovation & Continuous Improvement: Evaluate and adopt new technologies, with blockchain experience considered advantageous; document processes and ensure knowledge sharing; balance delivery with high standards and polish.
Requirements
- 7+ years of experience in SRE, DevOps, or related roles.
- Strong understanding of SRE best practices, architectures, resiliency patterns, and cloud security.
- Proficiency in Python, Golang, or JavaScript; Rust is advantageous.
- Hands‑on experience with AWS and modern cloud architectures; proficiency with Helm, Terraform, and CI/CD tools (GitHub Actions, ArgoCD).
- Experience with Kubernetes/EKS and GitOps methodologies.
- Experience with monitoring tools such as Prometheus and OpenTelemetry; familiarity with the LGTM stack or similar tools.
- Blockchain experience is advantageous; strong problem‑solving and cross‑functional collaboration skills.
- Experience in Agile environments and distributed teams; strong communication and proactive mindset.
Benefits
- Remote work
- Laptop reimbursement
- New starter package for hardware essentials
- Learning & Development opportunities
- Competitive PTO
At IOG, we are committed to fostering a diverse and inclusive workplace where all individuals are valued. We welcome people of all backgrounds and ensure that employment decisions are based on merit, qualifications, and potential. Everyone is given equal opportunities regardless of race, color, religion, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, disability, or any other characteristic protected by law.
Other
- Seniority level: Mid‑Senior level
- Employment type: Full‑time
- Job function: Engineering and Information Technology
- Industries: Online Audio and Video Media
- Location:
- Greater London, England, United Kingdom
- Salary:
- £100,000 - £125,000
- Job Type:
- FullTime
- Category:
- Engineering