Site Reliability Engineer, London

Site Reliability Engineer

New Yesterday

Overview

Direct message the job poster from Duffel

Create the future of travel with us

Whether it’s to visit the people closest to us, starting an exciting adventure, or a career-defining business trip, travel is an essential part of our lives. Yet we've all experienced the aches and pains of getting to our destination. Today, more than 4 billion airline passengers rely on technology that hasn't kept up with the expectations of the modern connected traveller.

That’s why we’ve started to rebuild the infrastructure that underpins the travel industry. We’re on a mission to unravel travel — simplifying systems and building the tools that will make the future of travel effortless.

We were part of Y Combinator S18's cohort and we are backed by Benchmark, Blossom, Index Ventures and Kima Ventures. A fantastic set of investors that has helped build some of the world's largest companies.

Our team in London is growing and we’re looking for talented people to join us on our journey

Engineering at Duffel

We're building tools to simplify travel distribution, search and booking. What does this actually mean? It's one common and seamless API. This brings huge technical challenges as we need to design and build a beautiful API before integrating to hundreds of airlines. Along with that we need to navigate through the differing needs and systems of each airline whilst building a fantastic developer experience to go with it.

The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform.

Site Reliability Engineering at Duffel

As an SRE at Duffel, you’ll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will be working closely with engineering teams to understand their needs and help meet the demands of our product as we scale globally.

What we're looking for

An infrastructure and systems engineering generalist who is comfortable diving deep into the weeds on different issues. Some recent examples include:
A configuration issue between Google’s Load Balancer and the HTTP server in our main Elixir application causing HTTP 5XX responses to be returned to our customers.
Debugging an issue in our OpenTelemetry pipelines causing us to silently drop spans.
An enthusiasm for both software development and systems engineering.
A high bar for code and configuration quality and readability.
A good understanding of current observability and reliability practices.
Experienced and comfortable in running incident response.
Big picture thinking - you can make trade offs on technical work streams against business impact.
Fantastic communication skills. You're able to articulate what you're working on and why to the team in a clear and structured way.
You thrive in a collaborative environment. You believe in your own methods but keep an open mind, taking suggestions and feedback onboard as well.
We run our infrastructure on Google Cloud Platform, so you’ll be helping to run a few of their products such as GKE, CloudSQL for PostgreSQL, BigQuery, Memorystore (Redis) and more.
We manage the infrastructure and security for a segregated PCI Cardholder Data Environment, entirely managed with Google Cloud Platform services and tooling.
We follow an Infrastructure as Code approach to managing our infrastructure, using Terraform.
We follow a GitOps approach to managing our Kubernetes configuration, using ArgoCD and Helm.
We manage a high-availability metrics collection system using Grafana, Thanos & Prometheus. We’re in the process of transitioning to OpenTelemetry and Honeycomb for our application telemetry (traces and metrics).
We manage a data pipeline using Pub/Sub, Airbyte, and dbt.

Don’t worry if your experience doesn’t exactly align with this stack, we understand that skills are transferrable. This is to give you an idea of what you’ll be working with if you join the team.

We’re currently driving a big shift in how we think about and monitor reliability across the engineering organisation, with a focus on early detection of customer-impacting issues.

We’re extending and standardising our use of OpenTelemetry, and introducing Honeycomb as the single place for engineers to understand how our applications are operating in production.

This project involves both technical work, on the application libraries and infrastructure that make up the OpenTelemetry pipeline, and an education piece, working to change perceptions and behaviours across engineering.

The Future

We currently run all our services from a single European region in Google Cloud. In the medium term, for performance, reliability, and data residency reasons, we’ll be starting to think about how to (re)architect our applications and infrastructure to span multiple regions, operating globally.
We deploy our application multiple times a day, but deploys are all or nothing, and when we encounter issues, roll backs are slow. One way to address this would be to invest in CI/CD performance improvements, but we’d also like to explore alternative deployment strategies like Canaries, Blue/Green, and traffic mirroring, and get more comfortable testing changes in production with real customer traffic.

What you can expect from us

we're dedicated to your personal growth. Our environment is comfortable both physically, but also in that our ears are always open to any ideas, concerns and questions. We believe that everyone should have pride in their work, taking full ownership of it and its impact. That's why everyone who joins Duffel owns a share of the company.

We are an equal opportunities employer. We believe that the key to our success is employing a diverse team, that's why recruitment decisions are only based on your experience and skills. We value your ability to problem solve and build amazing things so we welcome applications for everyone – regardless of age, sex, disability, sexual orientation, race, religion or belief.

Note to recruitment agencies

Duffel does not accept speculative CV's from external parties. Any unsolicited CV's sent to us will be treated as property of Duffel, and any attached terms and conditions associated with these CV's will be null and void

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Information Technology and Engineering

Industries

Technology, Information and Internet
Travel Arrangements

Referrals increase your chances of interviewing at Duffel by 2x

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 1 week ago

London, England, United Kingdom 1 month ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom $130,000.00-$180,000.00 18 hours ago

City Of London, England, United Kingdom 4 days ago

London, England, United Kingdom 21 hours ago

London, England, United Kingdom $130,000.00-$180,000.00 18 hours ago

London, England, United Kingdom 5 days ago

Software Engineer III - React / TypeScript

London, England, United Kingdom 1 day ago

Greater London, England, United Kingdom 18 hours ago

South Croydon, England, United Kingdom 5 days ago

London, England, United Kingdom 2 weeks ago

City Of London, England, United Kingdom £400.00-£500.00 19 hours ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 5 days ago

London, England, United Kingdom 21 hours ago

London, England, United Kingdom $30,000.00-$55,000.00 12 hours ago

London, England, United Kingdom 3 weeks ago

London, England, United Kingdom 1 day ago

London, England, United Kingdom 1 week ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

Apply

Location:: London
Job Type:: FullTime
Category:: Engineering

Start a New Search