Senior MLOps Engineer

New Yesterday

About Us We are an AI-native, VC-backed startup building a multimodal, proprietary foundation model with a profound understanding of retail, designed to hyper-personalise every shopper touchpoint. As we scale from research to production, we need robust infrastructure that makes our models reliable, reproducible, and observable at scale.
As a Senior MLOps Engineer, you will own the infrastructure and tooling that turns experimental models into dependable production systems. You will build the pipelines, monitoring, and deployment workflows that allow our Research Engineers to move fast without breaking things. If you want to operate at the intersection of machine learning and production systems engineering, this role is for you.
What You Will Do Build and maintain CI/CD pipelines for model training, evaluation, and deployment across research, staging, and production environments. Design and implement model registries, versioning systems, and experiment tracking to ensure full reproducibility of all model releases. Deploy ML workflows using tools like Airflow or similar, managing dependencies from data ingestion through model deployment and serving. Instrument comprehensive monitoring for model performance, data drift, prediction quality, and system health. Manage infrastructure as code (Terraform, or similar) for compute resources, ensuring efficient scaling across training and inference workloads. Collaborate with research and engineering teams to establish system SLOs/SLAs aligned with business objectives. Build tooling and abstractions that make it easy for Research Engineers to deploy models reliably without needing deep infrastructure knowledge. Ensure compliance, governance of all ML processes and workflows.
What We Look For Experience building and operating ML infrastructure, ideally in production environments serving real users. Strong proficiency in containerisation (Docker, Kubernetes) and orchestration of multi-stage ML workflows. Hands-on experience with ML platforms and tools such as MLflow, Kubeflow, Vertex AI, SageMaker, or similar model management systems. Practical knowledge of infrastructure as code, CI/CD best practices, and cloud platforms (AWS, GCP, or Azure). Experience with relational databases and data processing and query engines (Spark, Trino, or similar). Familiarity with monitoring, observability, and alerting systems for production ML (Prometheus, Grafana, Datadog, or equivalent). Understanding of ML concepts. You don't need to train models, but you should speak the language of Research Engineers and understand their constraints. A mindset that balances reliability with velocity: you care about reliability and reproducibility, but you also enable teams to ship fast.
Nice to Have Experience delivering API services (FastAPI, SpringBoot or similar). Experience with message brokers and real-time data and event processing (Kafka, Pulsar, or similar).
Why Join Us You'll be part of a small, high-output team where intensity and focus are the norm. You'll own the infrastructure that enables research to reach customers reliably and at scale. You'll solve hard problems at the edge of ML systems: multi-modal models, on-device deployment, real-time inference, and retail-scale operations. You'll work alongside people who care deeply, move quickly, and hold a high bar for excellence.
Location:
London, England, United Kingdom
Job Type:
FullTime

We found some similar jobs based on your search