Senior Machine Learning Engineer, England, United Kingdom

Senior Machine Learning Engineer

New Yesterday

Overview

Senior Machine Learning Operations Engineer – London (2x a week onsite) - £500 p/d (Outside IR35) – 6-months Contract

We are seeking an experienced MLOps Engineer to join a globally known consumer tech company focused on building innovative, large-scale platforms. The role involves evolving and scaling the machine learning platform to support high-throughput model inference and fast iteration cycles. This position suits someone who thrives at the intersection of MLOps, Kubernetes, and cloud infrastructure, with a hands-on approach to solving complex challenges.

Responsibilities

Collaborate with ML engineers and product teams to align infrastructure with project needs.
Research and implement cutting-edge MLOps practices and mentor colleagues in cloud operations and ML engineering best practices.
Manage GPU-powered Kubernetes clusters and improve automation pipelines to ensure reliability.
Build and manage Kubernetes clusters from scratch (manual configuration with tools like kubeadm) and deploy applications with Helm.
Maintain system reliability, perform incident response, and participate in on-call rotations.
Continuously optimize infrastructure for scalability, performance, and cost.

Key Skills

MLOps & Kubernetes: GPU-enabled cluster management, built from scratch using kubeadm and Helm.
Programming: Python or Go for ML automation workflows.
Containerization: Docker and containerized application deployment.
CI/CD & Automation: ArgoCD, GitHub Actions, Infrastructure-as-Code (Terraform).
Monitoring & Observability: Prometheus, Grafana, cloud-native stacks.
ML Lifecycle: Production experience with experimentation, training, deployment, versioning, and monitoring.
Reliability & Support: On-call participation, incident response, and system optimization.

Qualifications

Extensive hands-on experience in Kubernetes and cloud infrastructure.
Experience building and managing Kubernetes clusters from scratch and deploying applications with Helm.
Strong programming skills (Python or Go) for ML automation workflows.
Proficiency with CI/CD tools (ArgoCD, GitHub Actions) and IaC (Terraform).
Familiarity with monitoring/observability tools (Prometheus, Grafana).
Strong understanding of ML lifecycle (experimentation, training, deployment, versioning, monitoring).
Willingness to participate in on-call rotations and contribute to incident response.

Location

London (2x a week onsite)

Duration

6 months

Seniority level

Mid-Senior level

Employment type

Contract

#J-18808-Ljbffr

Apply

Location:: England, United Kingdom
Salary:: £80,000 - £100,000
Job Type:: FullTime
Category:: IT & Technology