Reliability Engineer
New Today
Job Description
Job Summary:
We are seeking a skilled and proactive Solace Messaging Administrator to join our Messaging team. You will be responsible for managing and supporting our enterprise messaging infrastructure built on Solace PubSub+, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana.
Key Responsibilities:
- Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud).
- Provide production support for messaging-related incidents, including root cause analysis and resolution.
- Monitor system performance and health using Prometheus and Grafana; proactively identify and address anomalies.
- Configure and optimize Solace across WAN environments, ensuring low-latency, secure, and reliable messaging.
- Collaborate with development and application support teams to troubleshoot message flow issues and integration problems.
- Perform capacity planning, scaling, and tuning of Solace infrastructure to meet current and future demand.
- Automate routine maintenance tasks and support continuous improvement of operational processes.
- Implement and maintain monitoring alerts, dashboards, and metrics to ensure visibility into the messaging layer.
- Ensure compliance with security policies and participate in audits and vulnerability remediation.
- Maintain accurate documentation, including topology diagrams, runbooks, and configuration baselines.
Required Skills & Qualifications:
- 3+ years of experience administering Solace PubSub+ messaging systems.
- Strong background in production support, preferably in a 24x7 enterprise environment.
- Experience working with distributed systems over WAN, with an understanding of networking, latency, and failover strategies.
- Solid experience with Prometheus and Grafana for system monitoring and alerting.
- Proficiency in troubleshooting message delivery, persistence, and topic routing.
- Experience with capacity management, performance tuning, and system scaling.
- Familiarity with Linux/Unix systems and scripting (Bash, Python, etc.).
- Strong analytical and problem-solving skills, with attention to detail.
- Excellent communication and collaboration skills.
Nice to Have:
- Experience with containerized environments (Docker/Kubernetes).
- Exposure to other messaging platforms (Kafka, RabbitMQ, MQ).
- Understanding of DevOps tools and CI/CD pipelines.
- Knowledge of cloud environments (AWS, Azure, GCP) and cloud-native Solace deployments.
Why Join Us?
- Be part of a mission-critical Messaging team enabling real-time data flows.
- Work with cutting-edge technologies and contribute to high-impact projects.
- Location:
- City Of London
- Job Type:
- FullTime
- Category:
- Technology
We found some similar jobs based on your search
-
New Today
Site Reliability Engineer
-
London
- Engineer, Reliability Engineer, Reliability, Engineering, Site
Job Description This role offers a hybrid work offering to be present in our London office twice per week. Reward Gateway|Edenred is a leading digital platform for services and payments for people at work, connecting 52 million users and 2 million ...
More Details -
-
New Today
Site Reliability Engineer
-
City Of London
- Engineer, Reliability Engineer, Reliability, Engineering, Site
Job Description This role offers a hybrid work offering to be present in our London office twice per week. Reward Gateway|Edenred is a leading digital platform for services and payments for people at work, connecting 52 million users and 2 million ...
More Details -
-
New Today
Site Reliability Engineer
-
United Kingdom
- IT
Site Reliability Engineer UK (Remote) 6 Month Contract An excellent contract opportunity for a skilled Site Reliability Engineer to join a forward-thinking technology company. This is a trusted, high-impact engineering environment where a Site Reliab...
More Details -
-
New Today
Reliability Engineer
-
London
- Technology
The Solace Messaging Administrator is responsible for managing and supporting our enterprise messaging infrastructure built on Solace PubSub+. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana. The ideal candidate has 3+ years of experience...
More Details -
-
New Today
Reliability Engineer
-
City Of London
- Technology
Job Description Job Summary: We are seeking a skilled and proactive Solace Messaging Administrator to join our Messaging team. You will be responsible for managing and supporting our enterprise messaging infrastructure built on Solace PubSub+, ensu...
More Details -
-
New Today
Mid & Senior Site Reliability Engineers - GDS - G7
-
City Of London, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Mid & Senior Site Reliability Engineers - GDS - G7 £56,070 - £89,880 see Civil Service Jobs link for full salary information. Full-time (Permanent) £56,.070 - £89.880 see civil service.gov.uk for full Salary information.
More Details -