Senior Machine Learning Engineer
New Yesterday
Overview
Senior Machine Learning Operations Engineer – London (2x a week onsite) - £500 p/d (Outside IR35) – 6-months Contract
We are seeking an experienced MLOps Engineer to join a globally known consumer tech company focused on building innovative, large-scale platforms. The role involves evolving and scaling the machine learning platform to support high-throughput model inference and fast iteration cycles. This position suits someone who thrives at the intersection of MLOps, Kubernetes, and cloud infrastructure, with a hands-on approach to solving complex challenges.
Responsibilities
- Collaborate with ML engineers and product teams to align infrastructure with project needs.
- Research and implement cutting-edge MLOps practices and mentor colleagues in cloud operations and ML engineering best practices.
- Manage GPU-powered Kubernetes clusters and improve automation pipelines to ensure reliability.
- Build and manage Kubernetes clusters from scratch (manual configuration with tools like kubeadm) and deploy applications with Helm.
- Maintain system reliability, perform incident response, and participate in on-call rotations.
- Continuously optimize infrastructure for scalability, performance, and cost.
Key Skills
- MLOps & Kubernetes: GPU-enabled cluster management, built from scratch using kubeadm and Helm.
- Programming: Python or Go for ML automation workflows.
- Containerization: Docker and containerized application deployment.
- CI/CD & Automation: ArgoCD, GitHub Actions, Infrastructure-as-Code (Terraform).
- Monitoring & Observability: Prometheus, Grafana, cloud-native stacks.
- ML Lifecycle: Production experience with experimentation, training, deployment, versioning, and monitoring.
- Reliability & Support: On-call participation, incident response, and system optimization.
Qualifications
- Extensive hands-on experience in Kubernetes and cloud infrastructure.
- Experience building and managing Kubernetes clusters from scratch and deploying applications with Helm.
- Strong programming skills (Python or Go) for ML automation workflows.
- Proficiency with CI/CD tools (ArgoCD, GitHub Actions) and IaC (Terraform).
- Familiarity with monitoring/observability tools (Prometheus, Grafana).
- Strong understanding of ML lifecycle (experimentation, training, deployment, versioning, monitoring).
- Willingness to participate in on-call rotations and contribute to incident response.
Location
London (2x a week onsite)
Duration
6 months
Seniority level
Mid-Senior level
Employment type
Contract
- Location:
- England, United Kingdom
- Salary:
- £80,000 - £100,000
- Job Type:
- FullTime
- Category:
- IT & Technology