Site Reliability Engineer
New Yesterday
Overview
Direct message the job poster from Duffel
Create the future of travel with us
Whether it’s to visit the people closest to us, starting an exciting adventure, or a career-defining business trip, travel is an essential part of our lives. Yet we've all experienced the aches and pains of getting to our destination. Today, more than 4 billion airline passengers rely on technology that hasn't kept up with the expectations of the modern connected traveller.
That’s why we’ve started to rebuild the infrastructure that underpins the travel industry. We’re on a mission to unravel travel — simplifying systems and building the tools that will make the future of travel effortless.
We were part of Y Combinator S18's cohort and we are backed by Benchmark, Blossom, Index Ventures and Kima Ventures. A fantastic set of investors that has helped build some of the world's largest companies.
Our team in London is growing and we’re looking for talented people to join us on our journey
Engineering at Duffel
We're building tools to simplify travel distribution, search and booking. What does this actually mean? It's one common and seamless API. This brings huge technical challenges as we need to design and build a beautiful API before integrating to hundreds of airlines. Along with that we need to navigate through the differing needs and systems of each airline whilst building a fantastic developer experience to go with it.
The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform.
Site Reliability Engineering at Duffel
As an SRE at Duffel, you’ll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will be working closely with engineering teams to understand their needs and help meet the demands of our product as we scale globally.
What we're looking for
- An infrastructure and systems engineering generalist who is comfortable diving deep into the weeds on different issues. Some recent examples include:
- A configuration issue between Google’s Load Balancer and the HTTP server in our main Elixir application causing HTTP 5XX responses to be returned to our customers.
- Debugging an issue in our OpenTelemetry pipelines causing us to silently drop spans.
- An enthusiasm for both software development and systems engineering.
- A high bar for code and configuration quality and readability.
- A good understanding of current observability and reliability practices.
- Experienced and comfortable in running incident response.
- Big picture thinking - you can make trade offs on technical work streams against business impact.
- Fantastic communication skills. You're able to articulate what you're working on and why to the team in a clear and structured way.
- You thrive in a collaborative environment. You believe in your own methods but keep an open mind, taking suggestions and feedback onboard as well.
- We run our infrastructure on Google Cloud Platform, so you’ll be helping to run a few of their products such as GKE, CloudSQL for PostgreSQL, BigQuery, Memorystore (Redis) and more.
- We manage the infrastructure and security for a segregated PCI Cardholder Data Environment, entirely managed with Google Cloud Platform services and tooling.
- We follow an Infrastructure as Code approach to managing our infrastructure, using Terraform.
- We follow a GitOps approach to managing our Kubernetes configuration, using ArgoCD and Helm.
- We manage a high-availability metrics collection system using Grafana, Thanos & Prometheus. We’re in the process of transitioning to OpenTelemetry and Honeycomb for our application telemetry (traces and metrics).
- We manage a data pipeline using Pub/Sub, Airbyte, and dbt.
Don’t worry if your experience doesn’t exactly align with this stack, we understand that skills are transferrable. This is to give you an idea of what you’ll be working with if you join the team.
We’re currently driving a big shift in how we think about and monitor reliability across the engineering organisation, with a focus on early detection of customer-impacting issues.
We’re extending and standardising our use of OpenTelemetry, and introducing Honeycomb as the single place for engineers to understand how our applications are operating in production.
This project involves both technical work, on the application libraries and infrastructure that make up the OpenTelemetry pipeline, and an education piece, working to change perceptions and behaviours across engineering.
The Future
- We currently run all our services from a single European region in Google Cloud. In the medium term, for performance, reliability, and data residency reasons, we’ll be starting to think about how to (re)architect our applications and infrastructure to span multiple regions, operating globally.
- We deploy our application multiple times a day, but deploys are all or nothing, and when we encounter issues, roll backs are slow. One way to address this would be to invest in CI/CD performance improvements, but we’d also like to explore alternative deployment strategies like Canaries, Blue/Green, and traffic mirroring, and get more comfortable testing changes in production with real customer traffic.
What you can expect from us
we're dedicated to your personal growth. Our environment is comfortable both physically, but also in that our ears are always open to any ideas, concerns and questions. We believe that everyone should have pride in their work, taking full ownership of it and its impact. That's why everyone who joins Duffel owns a share of the company.
We are an equal opportunities employer. We believe that the key to our success is employing a diverse team, that's why recruitment decisions are only based on your experience and skills. We value your ability to problem solve and build amazing things so we welcome applications for everyone – regardless of age, sex, disability, sexual orientation, race, religion or belief.
Note to recruitment agencies
Duffel does not accept speculative CV's from external parties. Any unsolicited CV's sent to us will be treated as property of Duffel, and any attached terms and conditions associated with these CV's will be null and void
Seniority level
- Mid-Senior level
Employment type
- Full-time
Job function
- Information Technology and Engineering
Industries
- Technology, Information and Internet
- Travel Arrangements
Referrals increase your chances of interviewing at Duffel by 2x
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 1 week ago
London, England, United Kingdom 1 month ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom $130,000.00-$180,000.00 18 hours ago
City Of London, England, United Kingdom 4 days ago
London, England, United Kingdom 21 hours ago
London, England, United Kingdom $130,000.00-$180,000.00 18 hours ago
London, England, United Kingdom 5 days ago
Software Engineer III - React / TypeScript
London, England, United Kingdom 1 day ago
Greater London, England, United Kingdom 18 hours ago
South Croydon, England, United Kingdom 5 days ago
London, England, United Kingdom 2 weeks ago
City Of London, England, United Kingdom £400.00-£500.00 19 hours ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 5 days ago
London, England, United Kingdom 21 hours ago
London, England, United Kingdom $30,000.00-$55,000.00 12 hours ago
London, England, United Kingdom 3 weeks ago
London, England, United Kingdom 3 weeks ago
London, England, United Kingdom 1 day ago
London, England, United Kingdom 1 week ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
- Location:
- London
- Job Type:
- FullTime
- Category:
- Engineering
We found some similar jobs based on your search
-
New Yesterday
Site Reliability Engineer
-
London
- Engineering
Overview Direct message the job poster from Duffel Create the future of travel with us Whether it’s to visit the people closest to us, starting an exciting adventure, or a career-defining business trip, travel is an essential part of our lives. Yet...
More Details -
-
New Yesterday
Network Site Reliability Engineer
-
United Kingdom
- Engineering
Overview The Enterprise Network Support and SRE team is looking to add a seasoned Technical SRE lead to help actualize the SRE vision for our network infrastructure. We seek an engineer who is passionate about the network and making its operation se...
More Details -
-
New Yesterday
Lead Site Reliability Engineer
-
London
- IT & Technology
Overview JOB TITLE: Lead Site Reliability Engineer SALARY: £104,686 - £123,160 LOCATION(S): London HOURS: Full time WORKING PATTERN: At least two days per week (or 40% of your time) at our office location listed above. About this opportunity ...
More Details -
-
New Yesterday
Site Reliability Engineer
-
United Kingdom
- Engineering
Luupli is a social media app that has equity, diversity, and equality at its heart. We believe that social media can be a force for good, and we are committed to creating a platform that maximizes the value that creators and businesses can gain from ...
More Details -
-
New Yesterday
Site Reliability Engineer
-
London
- Engineering
Job Description Would you like to be an Engineer that builds the Cloud, rather than just uses it? At AWS, our Engineers manage the behind-the-scenes software and tools that support the world's largest cloud computing infrastructure. We offer an exci...
More Details -
-
New Yesterday
Lead Site Reliability Engineer
-
London
- IT & Technology
JOB TITLE: Lead Site Reliability Engineer SALARY: £104,686 - £123,160 LOCATION(S): London HOURS: Full time WORKING PATTERN: At least two days per week (or 40% of your time) at our office location listed above. About this opportunity The Identity &...
More Details -