Senior Machine Learning Engineer

New Yesterday

Overview

Senior Machine Learning Operations Engineer – London (2x a week onsite) - £500 p/d (Outside IR35) – 6-months Contract

We are seeking an experienced MLOps Engineer to join a globally known consumer tech company focused on building innovative, large-scale platforms. The role involves evolving and scaling the machine learning platform to support high-throughput model inference and fast iteration cycles. This position suits someone who thrives at the intersection of MLOps, Kubernetes, and cloud infrastructure, with a hands-on approach to solving complex challenges.

Responsibilities

  • Collaborate with ML engineers and product teams to align infrastructure with project needs.
  • Research and implement cutting-edge MLOps practices and mentor colleagues in cloud operations and ML engineering best practices.
  • Manage GPU-powered Kubernetes clusters and improve automation pipelines to ensure reliability.
  • Build and manage Kubernetes clusters from scratch (manual configuration with tools like kubeadm) and deploy applications with Helm.
  • Maintain system reliability, perform incident response, and participate in on-call rotations.
  • Continuously optimize infrastructure for scalability, performance, and cost.

Key Skills

  • MLOps & Kubernetes: GPU-enabled cluster management, built from scratch using kubeadm and Helm.
  • Programming: Python or Go for ML automation workflows.
  • Containerization: Docker and containerized application deployment.
  • CI/CD & Automation: ArgoCD, GitHub Actions, Infrastructure-as-Code (Terraform).
  • Monitoring & Observability: Prometheus, Grafana, cloud-native stacks.
  • ML Lifecycle: Production experience with experimentation, training, deployment, versioning, and monitoring.
  • Reliability & Support: On-call participation, incident response, and system optimization.

Qualifications

  • Extensive hands-on experience in Kubernetes and cloud infrastructure.
  • Experience building and managing Kubernetes clusters from scratch and deploying applications with Helm.
  • Strong programming skills (Python or Go) for ML automation workflows.
  • Proficiency with CI/CD tools (ArgoCD, GitHub Actions) and IaC (Terraform).
  • Familiarity with monitoring/observability tools (Prometheus, Grafana).
  • Strong understanding of ML lifecycle (experimentation, training, deployment, versioning, monitoring).
  • Willingness to participate in on-call rotations and contribute to incident response.

Location

London (2x a week onsite)

Duration

6 months

Seniority level

Mid-Senior level

Employment type

Contract

#J-18808-Ljbffr
Location:
England, United Kingdom
Salary:
£80,000 - £100,000
Job Type:
FullTime
Category:
IT & Technology