Senior Site Reliability Engineer – Data Platform Engineering
New Yesterday
Join to apply for the Senior Site Reliability Engineer – Data Platform Engineering role at Flatiron HealthWe’re looking for a Site Reliability Engineer to help us accomplish our mission to improve and extend lives by learning from the experience of every person with cancer. Are you ready to be the next changemaker in cancer care?Flatiron Health is a healthtech company using data for good to power smarter care for every person with cancer, around the world. Flatiron partners with cancer centers in the US, Europe and Asia to transform patients’ real-life experiences into real-world evidence and create a more modern, connected oncology ecosystem. Our multidisciplinary teams include oncologists, data scientists, software engineers, epidemiologists, product experts and more. Flatiron Health is an independent affiliate of the Roche Group.What You’ll DoWe’re seeking a Site Reliability Engineer with a data engineering focus to help architect, build, and maintain reliable, scalable pipelines that form the foundation of our standardized data platform. You’ll partner closely with platform and data engineering teams to ensure that systems are robust, observable, and built with operational excellence in mind.As an SRE embedded in the data space, you will:Design and build reliable, scalable, and maintainable data pipelines using Databricks, Airflow, and GitLab CI/CDImplement SLOs, SLIs, and monitoring for data pipeline health, latency, throughput, and qualityEstablish best practices and standards for data infrastructure (e.g. versioning, testing, rollout)Develop automation and leverage best-in-class AI tooling to reduce toil associated with orchestration, error detection, retries, and alertingEnable secure and scalable data access patterns and controls for data stores like Snowflake and Amazon RDSCollaborate with platform and application teams to guide infrastructure decisions for data workflowsOptimize cost, reliability, and performance of data compute and storage layers in cloud environments (e.g. AWS)Participate in on-call rotations for the data platform stack and contribute to shared incident response processesWho You AreYou’re an engineer with 5+ years of experience working in DevOps, platform engineering, data engineering, or SRE roles. You have a passion for building reliable systems and the curiosity to work at the intersection of data and infrastructure.Some of your key qualifications include:Strong experience with workflow orchestration tools (Airflow preferred)Familiarity with Databricks, Spark, or other distributed compute platformsExperience building and maintaining CI/CD pipelines (GitLab preferred). Experience with data transformation projects (e.g. using dbt) a plusCompetency in writing infrastructure-as-code (e.g. Terraform, Ansible, etc.)Proficient in Python or another scripting language used in data automationExperience designing and monitoring data SLAs and building systems that are observable and testableFamiliar with cloud environments like AWS, including services like S3, IAM, EC2, and EMRProficient with containerized deployments (e.g. using Docker. Kubernetes a plus)You value high-quality documentation and operational runbooksYou’re collaborative, have strong written and verbal communication skills, and care about building systems that work for both engineers and the business
#J-18808-Ljbffr
- Location:
- Greater London, England, United Kingdom
- Job Type:
- FullTime