Make yourself visible and let companies apply to you.
Roles
Prometheus Jobs
Overview
Looking for top Prometheus jobs? Explore the latest Prometheus monitoring and alerting roles on Haystack, the leading IT job board. Whether you're a developer, DevOps engineer, or site reliability specialist, find your perfect Prometheus job today and advance your career in cloud-native infrastructure and observability. Start your search now!
Cloud Engineer
Spectrum It Recruitment Limited
Southampton
Hybrid
Mid - Senior
ÂŁ65,000
RECENTLY POSTED
+7

A growing UK software business is looking for a Cloud Engineer to help design, build and run secure, resilient cloud infrastructure across AWS and Azure.

You’ll play a key role in modernising platforms, migrating legacy services, and improving automation, observability and security across a multi-cloud estate.

Cloud Engineer (AWS & Azure)
Hybrid (2 days per month onsite)
Location: Southampton

What you’ll be doing

  • Designing, deploying and operating production cloud services across AWS & Azure (networking, storage, compute, app services).
  • Building secure, resilient, observable infrastructure and services (monitoring, logging, tracing).
  • Delivering cloud migration workstreams from traditional / on-prem environments into scalable cloud platforms.
  • Automating infrastructure and deployments using IaC and CI/CD, with a strong focus on repeatability and reliability.
  • Working closely with engineering and stakeholders to translate requirements into practical, supportable solutions.

What we’re looking for

  • Strong, hands-on experience with AWS & Azure in production environments.
  • Proven experience delivering cloud migrations (planning, build, cutover, optimisation).
  • Good understanding of security and operational best practice (identity, access, hardening, monitoring, incident readiness).
  • Comfort with automation and CI/CD (pipelines, deployment tooling, scripting).
  • Clear communicator who can collaborate across teams.

Technical environment (indicative)

  • AWS: EC2, ECS, S3, RDS, VPC, Lambda, IAM
  • Azure: Azure SQL, Entra ID, Azure DevOps, Container Apps, API Management, Functions
  • IaC / Automation: Terraform / OpenTofu / Scalr, Octopus Deploy (or similar), Azure DevOps, PowerShell, Azure CLI
  • Scripting: PowerShell, Python, Bash
  • Containers: Docker, container registries (e.g., ACR)
  • CI/CD: Azure DevOps Pipelines, YAML automation
  • Observability: Datadog, Grafana Cloud, OpenTelemetry, CloudWatch, Prometheus, Loki

Benefits (from day one)

  • Up to 15% Bonus scheme
  • 25 days annual leave + bank holidays
  • Pension: 4% employer contribution when you contribute 5%
  • Free onsite gym
  • EV car scheme
  • Healthcare scheme (incl. dental/eye care/treatments/diagnostics consultations)
  • Death in service (3x salary)
  • Employee Assistance Programme (24/7 counselling + legal/financial support + GP line)
  • Paid volunteering day + fundraising opportunities

Apply now or contact Chris Lynes at Spectrum IT Recruitment for more information.

Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

Linux Technician
Edinburgh Napier University
Edinburgh
In office
Junior - Mid
ÂŁ35,000
RECENTLY POSTED
+3

Edinburgh Napier University are currently the number one Scottish Modern University for research in Computer Science & Informatics. The way our School of Computing, Engineering and the Built Environment works as a school is intrinsically linked with industry, to ensure our world-leading research and teaching has an impact in all areas of life. To help us maintain this status, we are currently looking for an experienced Linux Technician to work closely with researchers, data scientists, and IT teams to ensure optimal performance, reliability, and scalability of our Linux systems, while encouraging use of Linux in general to students and staff alike. The Role: The post of Linux Technician will give you excellent opportunity to use your previous experience managing commercial and/or academic Linux environments to: -Provide first and second-line support to school academics, researchers, and students using the universitys High-Performance Computing (HPC) facilities. -Manage user accounts, access controls, and security policies on school hardware -Monitor system performance and usage; identify and resolve issues to minimise downtime (on school hardware). -Troubleshot and support users with job submissions, software builds, and application performance tuning. As part of our collaborative and supportive technical team, we anticipate that your proficiency with Linux (e.g., RHEL, CentOS, Ubuntu) and Bash scripting will allow you to effectively: -Promote the use of HPC across the School, providing regular workshops and demonstrations. -Support users with job submissions, software builds, and application performance tuning. -Automate routine tasks using scripts (e.g., Bash, Python, Ansible) -Ensure data integrity and backup/recovery processes and -Document system configurations, procedures, and troubleshooting steps. Furthermore, the role will also give you the chance to: -Collaborate with researchers and technical teams to evaluate and deploy scientific and engineering applications. -Assist with Linux installations throughout the School and -Assist with MS Windows based hardware and software installation and deployment If you have previous experience managing commercial or academic Linux environments and would like to join an organisation who value high standards, good work-life balance and collaborative working, then we would love to hear from you. What we will need from you: -Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience). -Previous experience managing commercial or academic Linux environments. -Experience with container technologies (e.g., Apptainer, Docker). -Familiarity with monitoring tools (e.g., Prometheus, Grafana, Ganglia). -Knowledge of MS Windows and application installation and PC hardware -Strong communication and presentation skills -Excellent troubleshooting and communication skills Benefits we offer: We offer support and recognition wherever due, as well as fantastic benefits such as pension with employer contributions of 17.6%, a minimum of 41 days holiday and annual pay reviews. There are professional development opportunities, discounted access to onsite sports facilities and a wide range of other staff discounts. Additional information: Salary:ÂŁ31,236 - ÂŁ37,694 per annum (depending on experience) Contract:Permanent, Full Time 35 hrs/wk Interviews:Week commencing 30th March 2026 Please note: We are reviewing applications as they come in and may close the advert early once we find the right candidate. If this role interests you, we encourage you to apply as soon as possible. *The University holds Disability Confident, Carer Positive and Stonewall Scotland Diversity Champion status.* *We are a flexible Employer.* *On this occasion, the University will not consider applicants requiring sponsorship for this role. International workers will therefore only be able to take up this role if they can demonstrate an alternative right to work in the UK.*

Splunk Enterprise and ITSI Expert
Stealth IT Consulting Limited
London
Hybrid
Mid - Senior
ÂŁ500/day
RECENTLY POSTED

Location: Hybrid 3 days onsite per week Sheffield, Birmingham, or London (UK)

Contract Duration:8 months

Day Rate: ÂŁ450 ÂŁ500 per day (Inside IR35)

Role Overview

This is a specialist role focused on designing, deploying, and optimising Splunk Enterprise and Splunk IT Service Intelligence (ITSI) in complex hybrid Kubernetes/OpenShift environments. You will handle large-scale data onboarding, build advanced ITSI service models and monitoring views, tune platform performance, implement secure governance, and integrate with modern observability pipelines. The position supports critical observability, reliability, and cost management for containerised workloads in a high-stakes enterprise setting.

Key Responsibilities

  • Design, deploy, and operate Splunk Enterprise and ITSI in hybrid Kubernetes/OpenShift environments.
  • Onboard data at scale using HEC, Universal Forwarders/Deployment Server; align to Common Information Model (CIM); enforce RBAC, retention policies, and cost guardrails.
  • Build ITSI service decompositions, KPIs (including multi-KPI), adaptive/time-based thresholds, NEAP policies, glass tables, deep dives, and service health scoring.
  • Create OpenShift-specific executive and operations views: cluster health (API/etcd), node readiness/pressure, pod restart hotspots, network/storage errors, capacity, quotas, and bursting visibility.
  • Tune search/platform performance: workload rules, concurrency limits, Data Model Acceleration (DMA), summary indexing, and scheduling optimisation.
  • Implement alerting, event enrichment, routing to ITSM/ChatOps, suppression windows, maintenance schedules, and runbook automation.
  • Govern data ingest and security: allow/deny lists, PII handling, TLS/mTLS, token/cert governance, index/role mapping, and data quality SLAs.
  • Integrate upstream sources/pipelines: OpenTelemetry (OTLP), Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka (with TLS), CMDB/ITSM enrichments, and AIOps/ML anomaly detection.

Essential Skills & Experience

  • Deep Splunk Enterprise expertise: SPL mastery, CIM alignment, KV stores/lookups/macros, saved searches, index/retention/RBAC design, search performance tuning.
  • Advanced Splunk ITSI knowledge: Service trees/decompositions, KPIs/thresholds (adaptive/time-based), NEAP tuning, glass tables, deep dives, Service Analyzer configuration.
  • Strong OpenShift/Kubernetes observability: Cluster/control-plane metrics, kube events/logs, workload/node/network/storage correlations, capacity/noisy-neighbor detection.
  • Experience with data pipelines/collectors: OpenTelemetry, Prometheus scraping, Fluentd/Fluent Bit/Vector, Kafka (TLS-secured), HEC/UF/DS onboarding.
  • Reliability & SLOs: Golden signals, rollout/rollback health checks, SLO/KPI mapping to namespaces/apps, executive/ops dashboards.
  • Performance & cost optimisation: Workload rules, DMA, summary indexing, schedule hygiene, license/cost guardrails.
  • Security & compliance: TLS/mTLS, token/cert management, PII controls, auditability, role/index mappings.
  • Automation & integrations: ITSM/ChatOps routing, runbooks, CMDB enrichment, webhook/AIOps integrations.

Preferred / Desirable

  • Hands-on experience in regulated/financial services environments.
  • Certifications: Splunk Enterprise Certified Architect, Splunk ITSI Certified Admin, or equivalent.
  • Familiarity with AIOps/ML features in Splunk for anomaly detection.
  • Previous work with container platforms (Kubernetes/OpenShift) for observability at scale.

Success Measures

  • High-quality, scalable Splunk/ITSI deployments with optimised performance and cost controls.
  • Effective service health monitoring via ITSI (accurate KPIs, glass tables, deep dives).
  • Reduced alerting noise, improved incident response through enriched routing and automation.
  • Strong governance, security compliance, and traceability in data ingest/observability pipelines.

This role is ideal for a Splunk specialist with proven expertise in ITSI and container observability, who can deliver robust, production-grade monitoring solutions in dynamic hybrid environments. Applications must be PAYE via Umbrella.

Site Reliability Engineer
Twinstream Limited
Bristol
Hybrid
Mid - Senior
ÂŁ95,000
RECENTLY POSTED
+9

Site Reliability Engineer | Bristol, Hybrid (3 days onsite, 2 from home) | Up to ÂŁ95K & Great Benefits

Ready to take on high-impact engineering challenges that actually matter? Want to work on mission-critical systems used across the UKs most high-profile government organisations?

This is your chance to join TwinStream a team of elite engineers who built their careers cracking complex cross-domain problems, and then built a company to do it even better.

Were growing fast. Demand for our services is skyrocketing. And now were looking for a Site Reliability Engineer whos ready to step into a role with real ownership, real influence, and real opportunities to innovate.

Why Youll Love This Role

As our new SRE, youll be right at the heart of our evolving cloud and on-prem platforms. This isnt a keep the lights on jobits a role where youll shape infrastructure strategy, partner closely with software and systems teams, and push performance, reliability, and automation to the next level.

You’ll help us evolve observability, enhance delivery pipelines, eliminate toil, drive reliability metrics, and make smart technical decisions that keep our systems robust as we scale.

If you love solving gnarly problems, improving how things work, and innovating at speedthis is the role for you.

Key Responsibilities of the Site Reliability Engineer:

Collaborating with Software Engineers to improve subsystem reliability and performance

Partnering with System Administrators to automate toil and cut down alert noise

Taking observability to the next levelfind issues before they hit the business

Supporting development environments to boost speed and quality

Researching & evaluating tools to guide key buy-vs-build decisions

Deepening your expertise across multiple technical and business domains

Expanding your knowledge of diverse tech stacks and platforms

What You Bring

Modern configuration management tools (Ansible, Chef or similar)

Terraform

Docker containers & orchestration (Kubernetes, OpenShift, Docker Swarm)

CI/CD tooling (Jenkins or similar)

Monitoring/metrics stack (InfluxDB, Prometheus, Grafana)

MQ messaging (RabbitMQ or other AMQP solutions)

SQL & relational databases

Linux administration & shell scripting

Network security fundamentals

Cloud hosting (ideally AWS: EC2, RDS, S3, Lambda)

Bonus points for:

Experience with Java, Go, Python or similar

Knowledge of cross-domain principles & tech

Service management experience

Hands-on observability implementation

Proven ability to reduce downtime with smart reliability metrics

Why Youll Love Working at TwinStream

Competitive salary, ÂŁ65k - ÂŁ95k DOE

8% employer pension contribution

Private medical healthcare (including dental & optical for the whole family)

Flexible working culture

Learning & development owned by YOU

Electric vehicle salary-sacrifice scheme

28 days holiday + bank holidays

Regular team events, plus Christmas & summer parties

Life assurance & cycle-to-work scheme

Security Clearance

Youll need to be eligible for SC and/or DV clearance. Any offer will be subject to successful security screening.

Ready to engineer impact?

Apply now and shape the future of secure, high-performance cross-domain systems.

TPBN1_UKTJ

DevOps Engineer
Anson McCade
Newcastle upon Tyne
Hybrid
Mid - Senior
ÂŁ60,000
RECENTLY POSTED
+9

Location: Newcastle Upon Tyne
Salary: Up to ÂŁ60k

Due to the nature of the work, this role requires candidates to undergo Security Clearance. Applicants must have at least 5 years of UK address history at the point of application.

Who is our client?
Our client is a global professional services organisation delivering technology, consulting and digital solutions to businesses across a wide range of industries. They work on complex transformation projects for both private and public sector clients and provide employees with opportunities to develop their skills within an innovative and collaborative technology environment.

The Role:
As a DevOps Engineer, you will:
Work within a DevOps team to deliver environments and tools that support development and testing activities.
Automate the provisioning of integrations between systems using APIs and modern cloud technologies.
Configure and maintain tools used across the development lifecycle including CI/CD and testing platforms.
Build automated processes that enable development teams to adopt continuous integration and continuous testing practices.
Support Scrum teams by maintaining reliable, scalable development and testing environments.
Contribute to the improvement of development workflows and operational efficiency through automation and DevOps best practices.

What the Successful DevOps Engineer can expect from their duties within the role:
Delivering and maintaining development, testing and deployment environments.
Automating infrastructure provisioning and software deployment processes.
Implementing monitoring, logging and alerting solutions to maintain system performance.
Working closely with development teams to improve CI/CD pipelines and DevOps practices.

The Successful DevOps Engineer will have experience in this tech stack:
Continuous Integration and Continuous Deployment tools such as GitLab, Jira or CodeBuild
Infrastructure as Code tools such as CloudFormation, AWS CDK or Terraform
Automation tools such as Ansible for infrastructure provisioning and environment setup
Monitoring and logging tools such as CloudWatch, AppDynamics, Kibana, Splunk or Prometheus
Public or private cloud platforms and virtualisation technologies
RESTful APIs and integrating them within DevOps pipelines
Agile and Lean software delivery methodologies
Linux based environments and system administration
At least one scripting language such as Python, Node.js, Typescript, PowerShell or Bash

Some Additional experience that is highly desirable but not required for this role are:
Experience introducing DevOps practices to new projects or teams
Experience defining and managing DevOps implementation roadmaps
Experience contributing to technical architecture and system design
Knowledge of virtual networking or functional and performance testing

Benefits
The Successful DevOps Engineer will benefit from these perks that come with the role:
Competitive salary
25 days vacation per year
Private medical insurance
Three extra leave days per year for charitable work
Opportunity to work in a hybrid environment and collaborate with global teams

If you are interested in this role, or have the experience required, please apply below.Reference: AMC/KPE/DVOA

Site ReliabilityEngineer
Twinstream Limited
Bristol
Hybrid
Mid - Senior
ÂŁ65,000 - ÂŁ95,000
+9

Site Reliability Engineer Bristol, Hybrid (3 days onsite, 2 from home) Up to ÂŁ95K & Great Benefits

Ready to take on high-impact engineering challenges that actually matter? Want to work on mission-critical systems used across the UK’s most high-profile government organisations?

This is your chance to join TwinStream-a team of elite engineers who built their careers cracking complex cross-domain problems, and then built a company to do it even better.

We’re growing fast. Demand for our services is skyrocketing. And now we’re looking for a Site Reliability Engineer who’s ready to step into a role with real ownership, real influence, and real opportunities to innovate.

Why You’ll Love This Role

As our new SRE, you’ll be right at the heart of our evolving cloud and on-prem platforms. This isn’t a “keep the lights on” job-it’s a role where you’ll shape infrastructure strategy, partner closely with software and systems teams, and push performance, reliability, and automation to the next level.

You’ll help us evolve observability, enhance delivery pipelines, eliminate toil, drive reliability metrics, and make smart technical decisions that keep our systems robust as we scale.

If you love solving gnarly problems, improving how things work, and innovating at speed-this is the role for you.

Key Responsibilities of the Site Reliability Engineer:

  • Collaborating with Software Engineers to improve subsystem reliability and performance
  • Partnering with System Administrators to automate toil and cut down alert noise
  • Taking observability to the next level-find issues before they hit the business
  • Supporting development environments to boost speed and quality
  • Researching & evaluating tools to guide key buy-vs-build decisions
  • Deepening your expertise across multiple technical and business domains
  • Expanding your knowledge of diverse tech stacks and platforms

What You Bring

  • Modern configuration management tools (Ansible, Chef or similar)
  • Terraform
  • Docker containers & orchestration (Kubernetes, OpenShift, Docker Swarm)
  • CI/CD tooling (Jenkins or similar)
  • Monitoring/metrics stack (InfluxDB, Prometheus, Grafana)
  • MQ messaging (RabbitMQ or other AMQP solutions)
  • SQL & relational databases
  • Linux administration & shell scripting
  • Network security fundamentals
  • Cloud hosting (ideally AWS: EC2, RDS, S3, Lambda)

Bonus points for:

  • Experience with Java, Go, Python or similar
  • Knowledge of cross-domain principles & tech
  • Service management experience
  • Hands-on observability implementation
  • Proven ability to reduce downtime with smart reliability metrics

Why You’ll Love Working at TwinStream

  • Competitive salary, ÂŁ65k - ÂŁ95k DOE
  • 8% employer pension contribution
  • Private medical healthcare (including dental & optical for the whole family)
  • Flexible working culture
  • Learning & development owned by YOU
  • Electric vehicle salary-sacrifice scheme
  • 28 days holiday + bank holidays
  • Regular team events, plus Christmas & summer parties
  • Life assurance & cycle-to-work scheme

Security Clearance

You’ll need to be eligible for SC and/or DV clearance. Any offer will be subject to successful security screening.

Ready to engineer impact?

Apply now and shape the future of secure, high-performance cross-domain systems.

AWS DevOps Engineer - Blackburn/ Hybrid
Oscar Associates (UK) Limited
Blackburn
Hybrid
Senior
Private salary
+6

AWS DevOps Engineer - Blackburn/Hybrid We are looking for a Senior AWS DevOps Engineer to join our remote-first team and take full ownership of a modern, automated ecosystem. If you're the type of engineer who lives to 'automate everything,' thrives on building resilient CI/CD pipelines, and views Infrastructure as Code as the only way to scale, this is the challenge you've been looking for. You will be the bridge between code and production, ensuring our environments are scalable, resilient, and invisible to the developers. Infrastructure as Code (IaC): Design and deploy complex environments using Terraform or CloudFormation (No ClickOps allowed). CI/CD Orchestration: Build and optimize sophisticated pipelines (GitHub Actions, GitLab CI, or Jenkins) to move code from commit to production with zero friction. Kubernetes & Containers: Manage and scale EKS clusters, focusing on high availability, service mesh, and cost optimization. Serverless & Scaling: Architecting solutions utilizing Lambda, API Gateway, and DynamoDB to handle fluctuating global traffic. Security & Observability: Implementing 'Shift Left' security and building deep-visibility dashboards using Prometheus, Grafana, or Datadog . What You Bring to the Table We are looking for a DevOps practitioner who lives in the terminal and understands that 'DevOps' is a culture, not just a job title. AWS Mastery: Deep, production-level experience with the AWS ecosystem (EC2, S3, RDS, IAM, VPC, Route53). Coding/Scripting DNA: Strong proficiency in Python, Go, or Bash for automation and custom tooling. Container Expert: Hands-on experience with Docker and orchestration (Kubernetes/EKS is a must). The MSP/Consultancy Edge: (Optional but preferred) Experience managing diverse client environments or high-traffic SaaS platforms. Problem Solver: The ability to troubleshoot complex distributed systems and perform root-cause analysis on production incidents. AWS DevOps Engineer - Blackburn/Hybrid Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website. To From Record Yes No Always use these settings TPBN1\_UKTJ

DevOps Engineer
Anson McCade
Gloucester
Hybrid
Mid - Senior
ÂŁ70,000
+5

Location: Gloucester (Hybrid / On-site) Salary: Up to £72,000 (Dependent on Experience) Security Clearance: Must be eligible a sole British Citizen. Role Overview As a DevOps Engineer within the Digital Intelligence division, you will be the vital link between development, testing, and operations. You will support national security missions by automating the delivery of high-assurance software. Your work ensures that code moves swiftly from developer environments to secure, hardened production platforms with maximum reliability and minimal manual intervention. Key Responsibilities CI/CD Pipeline Management: Design, build, and maintain automated build and release pipelines (e.g., Jenkins, GitLab CI) to streamline software delivery. Infrastructure as Code (IaC): Use tools like Terraform or Ansible to manage cloud and on-premise infrastructure in a repeatable, version-controlled manner. Container Orchestration: Deploy and manage scalable workloads using Docker and Kubernetes (EKS, AKS, or self-hosted clusters). Cloud & Platform Management: Configure and secure environments across Public (AWS/Azure) and Private cloud platforms. System Integration: Collaborate with software engineers to resolve complex issues across infrastructure, networking, and databases. Security & Hardening: Implement DevSecOps best practices, including server hardening and automated security benchmarking (e.g., CIS). Professional Experience Proven experience in a DevOps, SRE, or Infrastructure role within a complex, high-security or regulated environment. Experience working within Agile delivery teams (Scrum or Kanban). Strong understanding of monitoring and logging stacks (e.g., ELK, Prometheus, Grafana). Deep familiarity with version control systems, specifically Git. A “Security-First” mindset regarding all aspects of automation and deployment. Benefits & Culture Flexible Working: Hybrid working model with a focus on work-life balance. Career Progression: Defined pathways into Senior DevOps or Lead Architect roles. Professional Development: Access to dedicated training budgets and internal mentorship. Pension & Healthcare: Industry-leading pension scheme and private medical insurance. Security & Eligibility Nationality: Due to the nature of the work within Digital Intelligence, candidates must typically be British Citizens (often without dual nationality) to meet vetting requirements. Clearance: Successful candidates will be required to undergo a rigorous UK government background investigation (SC or DV level).

TPBN1_UKTJ

Senior DevOps Engineer
Anson McCade
Newcastle upon Tyne
Hybrid
Senior
ÂŁ60,000
+10

£Up to £60,000 GBP Competitive Bonus Hybrid WORKING Location: Newcastle Upon Tyne, North East - United Kingdom Type: Permanent Senior DevOps Engineer Our client is a leading technology consultancy, ranked No. 1 in its industry and included for over 20 years on Fortune’s World’s Most Admired Companies . As a Senior DevOps Engineer , you will play a key role in designing, building, and automating secure, scalable environments to support the development and delivery of enterprise software solutions. Working within multi-skilled agile teams, you will implement CI/CD pipelines, infrastructure-as-code, and automated testing processes, ensuring delivery teams can develop, test, and deploy features efficiently and reliably. This is a client-facing, delivery-focused role suited to experienced DevOps engineers who are comfortable leading technical initiatives, collaborating across teams, and driving best practices in automation, cloud deployment, and operational resilience. You’ll have the opportunity to: Design, implement, and maintain CI/CD pipelines to support agile software delivery Automate infrastructure provisioning and environment setup using tools such as CloudFormation, AWS CDK, Terraform, or Ansible Configure and maintain tools for source control, testing, performance, and security, including GitLab, Jenkins, Smartbear, and SonarQube Implement system monitoring and alerting using tools such as CloudWatch, AppDynamics, Kibana, Splunk, or Prometheus Work with cloud and virtualization technologies to deliver secure, scalable environments Collaborate with Scrum teams to integrate automated testing and continuous delivery practices Your Responsibilities: Support development teams by automating workflows, integrations, and deployments Ensure development, testing, and production environments are robust, resilient, and scalable Apply Infrastructure-as-Code practices to provision and manage environments Monitor and maintain operational performance, security, and resilience of deployed systems Work closely with team leads to implement DevOps practices and facilitate continuous improvement Contribute to technical design, architecture, and roadmap discussions for DevOps initiatives Mentor and guide other engineers on DevOps best practices and tooling Key Requirements: Proven experience in Continuous Integration and Continuous Deployment with hands-on implementation Infrastructure-as-Code experience using CloudFormation, AWS CDK, Terraform, or similar Experience with automated software QA, system monitoring, and alerting Knowledge of public/private cloud environments and virtualization technologies Proficiency with Linux administration and one or more scripting languages (Python, Node.js, TypeScript, Bash, PowerShell) Familiarity with RESTful APIs and their integration into DevOps pipelines Experience working in Agile and Lean software delivery environments Strong collaboration and communication skills across functional and technical teams You will gain exposure with: Cloud-native deployment and containerization technologies Advanced automation for development, testing, and operations workflows Enterprise-scale DevOps practices in highly regulated client environments Continuous improvement of security, performance, and operational resilience Leadership opportunities in introducing and defining DevOps practices across teams Why Join?: Work on high-profile, enterprise-scale programmes delivering real-world impact Develop your career as a Senior DevOps Engineer in a world-leading, highly admired technology consultancy Collaborate with multi-skilled teams across cloud, development, and operations functions Benefit from structured learning, mentoring, and career progression opportunities Be part of a collaborative, inclusive, and ambitious consulting culture Interested? Apply Now! Reference: AON/AMC/PGDevops #aaon TPBN1_UKTJ

DevOps Engineer (eDV Cleared)
Oscar Associates Limited
Gloucester
In office
Mid - Senior
ÂŁ100,000
+1

DevOps Engineer - eDV Cleared - Up to ÂŁ100,000

Oscar Technology are working with a leading consultancy focused on delivering highly secure IT Infrastructure and Networks for government and defence organisations across the UK.

Despite their successes to date, they have plenty of ambitious goals to achieve in the coming years, with the team expected to grow exponentially in coming months. Now is the time to join.

Due to the nature of the role, an active UK*C DV Clearance is required for eligibility.

Your day-to-day will consist of:

  • Gathering and analysing statistics from operating systems and applications regarding performance tuning and error searching.
  • Troubleshooting and providing solutions for technical issues across the stack.
  • Be a part of system design consultation, platform management and capacity planning.
  • Using well-defined service level objectives to balance feature development speed and reliability.
  • Ensure compliance with GIT Policies
  • Manage and optimise AWS Cloud Infrastructure
  • Design, develop, test, and deploy scalable and resilient solutions, including Terraform modules and Kubernetes configurations, to support evolving business needs.
  • Optimize deployment processes through automated CI/CD pipelines

Requirements:

  • Bachelor’s degree (or equivalent) in Computer Science or related discipline.
  • Prior experience in a DevOps Engineer role using Azure OR AWS, (EC2, RDS, S3, VPC, CloudFormation)
  • Experience with IaC such as Terraform.
  • A background in containerisation and orchestration technologies such as Docker and Kubernetes.
  • Knowledge with CI/CID pipelines and tools.
  • Experience with monitoring and observability tools, including Grafana, Prometheus or Loki.

DevOps Engineer - eDV Cleared - Up to ÂŁ100,000

Oscar Associates (UK) Limited is acting as an Employment Agency in relation to this vacancy.

To understand more about what we do with your data please review our privacy policy in the privacy section of the Oscar website.

Senior Platform Engineer
Spectrum IT Recruitment
Not Specified
Remote or hybrid
Senior
ÂŁ70,000 - ÂŁ75,000
+8

We are looking for a Senior Platform Engineer to join a growing platform team supporting a large-scale SaaS platform. This role focuses on improving reliability, scalability, and performance while helping drive a major cloud-native transition to Azure and Kubernetes.

You will work closely with engineering teams to modernise infrastructure, automate operations, and ensure highly observable and resilient systems in a production environment handling sensitive data.

Key Responsibilities

  • Design and deliver cloud migration and containerisation initiatives
  • Operate and improve production monitoring, alerting, and system reliability
  • Automate infrastructure provisioning using Pulumi and Ansible
  • Participate in incident response and on-call rotations
  • Lead blameless post-mortems and continuous improvement efforts
  • Collaborate with product engineering teams on observability and performance

Key Skills

  • Strong experience with Azure cloud architecture
  • Hands-on Kubernetes experience
  • Pulumi experience
  • Experience troubleshooting distributed systems
  • Monitoring and observability tools such as Prometheus, Grafana, or Graylog
  • Linux systems administration
  • Scripting with PowerShell, Bash, or Python
  • Experience with SQL Server, PostgreSQL, or Redis
  • Strong documentation and communication skills

Desirable

  • Software development background (C#, Go, TypeScript, Python)
  • Experience with Ansible
  • Go development experience
  • Experience in regulated environments
  • Familiarity with Windows / IIS environments

Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

Site ReliabilityEngineer
Twinstream Limited
Gloucester
Hybrid
Mid - Senior
ÂŁ65,000 - ÂŁ95,000
+9

Site Reliability Engineer Bristol, Hybrid (3 days onsite, 2 from home) Up to ÂŁ95K & Great Benefits

Ready to take on high-impact engineering challenges that actually matter? Want to work on mission-critical systems used across the UK s most high-profile government organisations?

This is your chance to join TwinStream a team of elite engineers who built their careers cracking complex cross-domain problems, and then built a company to do it even better.

We re growing fast. Demand for our services is skyrocketing. And now we re looking for a Site Reliability Engineer who s ready to step into a role with real ownership, real influence, and real opportunities to innovate.

Why You ll Love This Role

As our new SRE, you ll be right at the heart of our evolving cloud and on-prem platforms. This isn t a keep the lights on job it s a role where you ll shape infrastructure strategy, partner closely with software and systems teams, and push performance, reliability, and automation to the next level.

You’ll help us evolve observability, enhance delivery pipelines, eliminate toil, drive reliability metrics, and make smart technical decisions that keep our systems robust as we scale.

If you love solving gnarly problems, improving how things work, and innovating at speed this is the role for you.

Key Responsibilities of the Site Reliability Engineer:

  • Collaborating with Software Engineers to improve subsystem reliability and performance
  • Partnering with System Administrators to automate toil and cut down alert noise
  • Taking observability to the next level find issues before they hit the business
  • Supporting development environments to boost speed and quality
  • Researching & evaluating tools to guide key buy-vs-build decisions
  • Deepening your expertise across multiple technical and business domains
  • Expanding your knowledge of diverse tech stacks and platforms

What You Bring

  • Modern configuration management tools (Ansible, Chef or similar)
  • Terraform
  • Docker containers & orchestration (Kubernetes, OpenShift, Docker Swarm)
  • CI/CD tooling (Jenkins or similar)
  • Monitoring/metrics stack (InfluxDB, Prometheus, Grafana)
  • MQ messaging (RabbitMQ or other AMQP solutions)
  • SQL & relational databases
  • Linux administration & shell scripting
  • Network security fundamentals
  • Cloud hosting (ideally AWS: EC2, RDS, S3, Lambda)

Bonus points for:

  • Experience with Java, Go, Python or similar
  • Knowledge of cross-domain principles & tech
  • Service management experience
  • Hands-on observability implementation
  • Proven ability to reduce downtime with smart reliability metrics

Why You ll Love Working at TwinStream

  • Competitive salary, ÂŁ65k - ÂŁ95k DOE
  • 8% employer pension contribution
  • Private medical healthcare (including dental & optical for the whole family)
  • Flexible working culture
  • Learning & development owned by YOU
  • Electric vehicle salary-sacrifice scheme
  • 28 days holiday + bank holidays
  • Regular team events, plus Christmas & summer parties
  • Life assurance & cycle-to-work scheme

Security Clearance

You ll need to be eligible for SC and/or DV clearance. Any offer will be subject to successful security screening.

Ready to engineer impact?

Apply now and shape the future of secure, high-performance cross-domain systems.

DevOps Engineer
OCC Computer Personnel
Crewe
Hybrid
Mid - Senior
Private salary
+2

DevOps Engineer (AWS, Kubernetes, Docker) Ready to build, scale and shape cloud infrastructure in a fast-growing tech environment? Our client is looking for a DevOps Engineer to join one of their fastest-growing areas within vehicle telematics. You ll play a key role in deploying, monitoring and supporting cloud infrastructure using modern software development and DevSecOps practices with real opportunity to influence both the platform and the team. You will be designing and managing scalable AWS infrastructure, orchestrating containers using Docker & Kubernetes, building CI/CD pipelines and Infrastructure as Code (Terraform), implementing monitoring and alerting (Prometheus & Grafana), designing and maintaining queue-based processing (RabbitMQ & AWS SQS). In addition to ensuring secure cloud operations and managing access controls and optimising cost, scalability and performance across environments. Tech Stack: AWS, Kubernetes, Docker, Argo, Prometheus, Grafana, Terraform, RabbitMQ You ll suit this role if you have strong commercial experience with AWS, Docker and Kubernetes, and enjoy taking ownership of projects from design through to implementation. You ll be joining a highly tech-driven, innovative organisation where you ll have genuine career progression and the support of an agile, collaborative team. Hybrid role with modern offices in Crewe. For more info please get in touch.

SPLUNK Enterprise and ITSI Expert
Experis
Sheffield
Hybrid
Mid - Senior
ÂŁ470/day - ÂŁ520/day

Location: 3 days on site in either Sheffield/Birmingham/London
Duration: 30/11/2026
Rate 529

MUST BE PAYE THROUGH UMBRELLA
"Key Responsibilities

  • Design, deploy, and operate Splunk Enterprise and ITSI for hybrid Kubernetes/OpenShift environments.
  • Onboard data at scale (HEC, Universal Forwarder/Deployment Server), align to CIM, and enforce RBAC, retention, and cost guardrails.
  • Build ITSI service decompositions, KPIs/multi-KPI thresholds, NEAP policies, glass tables, deep dives, and service health scoring.
  • Create OpenShift-focused exec/ops views: cluster health (API/etcd), node readiness/pressure, pod restart hotspots, network/storage errors, capacity and quota/bursting visibility.
  • Tune search and platform performance: workload rules, concurrency, DMA, summary indexing, and scheduling hygiene.
  • Implement alerting, enrichment, routing to ITSM/ChatOps, suppression windows, maintenance schedules, and runbook automation.
  • Govern ingest and security: allow/deny lists, PII handling, TLS, token governance, index/role mapping, and data quality SLAs.
  • Integrate upstream sources and pipelines: OpenTelemetry, Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka, CMDB/ITSM enrichments, AIOps/ML anomaly detection.

Required Skills

    • Splunk Enterprise: SPL mastery, CIM alignment, KV/lookups/macros, saved searches, index/retention/RBAC design, search performance tuning.
    • Splunk ITSI: Service trees, KPIs, adaptive/time-based thresholds, NEAP tuning, glass tables, deep dives, Service Analyzer configuration.
    • OpenShift/Kubernetes observability: Cluster/control-plane metrics, kube events/logs, workload/node/network/storage correlation, capacity and noisy-neighbor detection.
    • Data pipelines & collectors: OpenTelemetry (OTLP), Prometheus scraping, Fluentd/Fluent Bit/Vector, Kafka (TLS), HEC/UF/DS onboarding.
    • Reliability & SLOs: Golden signals, rollout/rollback health checks, SLO/KPI mapping to namespaces/apps, executive and ops dashboards.
    • Performance & cost optimization: Workload rules, DMA, summary indexing, schedule optimization, license/cost guardrails.
    • Security & compliance: TLS/mTLS, token and cert hygiene, PII controls, auditability, role/index mappings.
    • Automation & integrations: ITSM/ChatOps routing, runbooks, CMDB enrichment, webhook/AIOps integrations."
Splunk and OpenShift Observability Engineer
CBSbutler Holdings Limited trading as CBSbutler
Multiple locations
Remote or hybrid
Mid - Senior
ÂŁ400/day - ÂŁ490/day

We’re looking for a Splunk & OpenShift Observability Engineer to design, deploy, and optimise enterprise-grade monitoring across hybrid Kubernetes and OpenShift environments.

This is a high-impact role where you’ll shape observability strategy, enhance service intelligence, and ensure platform reliability at scale - balancing performance, cost efficiency, and security governance.

You’ll work at the intersection of platform engineering, observability, and service intelligence, helping to transform raw telemetry into actionable insight. This is an opportunity to influence reliability strategy, improve operational maturity, and deliver measurable value across a modern cloud-native estate.

What You’ll Be Doing

  • Design, deploy, and operate Splunk Enterprise and ITSI across hybrid Kubernetes/OpenShift platforms
  • Onboard and normalise data at scale (HEC, Universal Forwarder, Deployment Server), aligning to CIM standards
  • Build and optimise ITSI service models: service trees, KPIs, adaptive thresholds, NEAP policies, glass tables, deep dives, and health scoring
  • Deliver OpenShift-focused executive and operational dashboards, including:
  • Cluster/API/etcd health
  • Node readiness and resource pressure
  • Pod restart trends and noisy-neighbour detection
  • Network and storage error visibility
  • Capacity, quota, and burst analysis
  • Optimise search and platform performance (workload rules, DMA, summary indexing, scheduling hygiene, concurrency tuning)
  • Implement intelligent alerting and automated routing into ITSM and ChatOps platforms, including enrichment, suppression windows, and maintenance scheduling
  • Govern data ingestion and security controls (RBAC, retention, PII handling, TLS, token governance, index and role mapping)
  • Integrate telemetry pipelines including OpenTelemetry, Prometheus, Fluentd/Fluent Bit/Vector, Kafka, CMDB and AIOps/ML solutions
  • Drive SLO/KPI alignment, golden signal monitoring, rollout/rollback health validation, and executive reporting

What You’ll Bring

  • Deep expertise in Splunk Enterprise (SPL mastery, CIM alignment, saved searches, macros, KV stores, index/retention/RBAC design, performance tuning)
  • Strong experience with Splunk ITSI (service trees, KPIs, adaptive/time-based thresholds, NEAP tuning, Service Analyzer configuration)
  • Proven OpenShift/Kubernetes observability experience across control-plane metrics, events, logs, workload correlation, and capacity management
  • Hands-on experience with telemetry pipelines (OpenTelemetry/OTLP, Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka with TLS, HEC/UF/DS onboarding)
  • Strong understanding of reliability engineering principles (golden signals, SLO design, namespace/application KPI mapping)
  • Experience optimising performance and licensing costs using workload rules, DMA, and summary indexing
  • Solid security and compliance knowledge (TLS/mTLS, certificate/token hygiene, PII controls, auditability, role/index mapping)
  • Automation and integration expertise across ITSM, ChatOps, webhooks, CMDB enrichment, and AIOps tooling
Cloud Engineer
Spectrum IT Recruitment
Southampton
Hybrid
Mid - Senior
ÂŁ55,000 - ÂŁ65,000
+7

A growing UK software business is looking for a Cloud Engineer to help design, build and run secure, resilient cloud infrastructure across AWS and Azure.

You’ll play a key role in modernising platforms, migrating legacy services, and improving automation, observability and security across a multi-cloud estate.Cloud Engineer (AWS & Azure)Hybrid (2 days per month onsite)Location: Southampton What you’ll be doing

  • Designing, deploying and operating production cloud services across AWS & Azure (networking, storage, compute, app services).
  • Building secure, resilient, observable infrastructure and services (monitoring, logging, tracing).
  • Delivering cloud migration workstreams from traditional / on-prem environments into scalable cloud platforms.
  • Automating infrastructure and deployments using IaC and CI/CD, with a strong focus on repeatability and reliability.
  • Working closely with engineering and stakeholders to translate requirements into practical, supportable solutions.

What we’re looking for

  • Strong, hands-on experience with AWS & Azure in production environments.
  • Proven experience delivering cloud migrations (planning, build, cutover, optimisation).
  • Good understanding of security and operational best practice (identity, access, hardening, monitoring, incident readiness).
  • Comfort with automation and CI/CD (pipelines, deployment tooling, scripting).
  • Clear communicator who can collaborate across teams.

Technical environment (indicative)

  • AWS: EC2, ECS, S3, RDS, VPC, Lambda, IAM
  • Azure: Azure SQL, Entra ID, Azure DevOps, Container Apps, API Management, Functions
  • IaC / Automation: Terraform / OpenTofu / Scalr, Octopus Deploy (or similar), Azure DevOps, PowerShell, Azure CLI
  • Scripting: PowerShell, Python, Bash
  • Containers: Docker, container registries (e.g., ACR)
  • CI/CD: Azure DevOps Pipelines, YAML automation
  • Observability: Datadog, Grafana Cloud, OpenTelemetry, CloudWatch, Prometheus, Loki

Benefits (from day one)

  • Up to 15% Bonus scheme
  • 25 days annual leave + bank holidays
  • Pension: 4% employer contribution when you contribute 5%
  • Free onsite gym
  • EV car scheme
  • Healthcare scheme (incl. dental/eye care/treatments/diagnostics consultations)
  • Death in service (3x salary)
  • Employee Assistance Programme (24/7 counselling + legal/financial support + GP line)
  • Paid volunteering day + fundraising opportunities

Apply now or contact Chris Lynes at Spectrum IT Recruitment for more information.

Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

DevOps Engineer
Applause IT Recruitment Ltd
Crewe
Hybrid
Mid - Senior
Private salary
+9

Role: DevOps Engineer
Location: Crewe (Hybrid - 3 days onsite per week)
Contract: Permanent
Hours: Full-time

A fantastic opportunity has become available for an experienced DevOps Engineer to join a growing technology team building a feature-rich platform supporting the fast-moving Electric Vehicle (EV) sector.

This is a hands-on role working within a collaborative, high-performing environment, where you’ll be responsible for improving deployment automation, scaling infrastructure, enhancing reliability, and driving best practices across modern DevOps tooling and processes.

The team is working at pace on a platform with rapidly evolving requirements, offering excellent scope for innovation, ownership, and technical progression.

The Role

As a DevOps Engineer, you will:

Build and maintain deployment automation across a large application portfolio

Deliver infrastructure provisioning and scaling using IaC (Cloud Development Kit or Terraform)

Support application configuration, optimisation, and migration into high-availability setups

Manage database deployments and contribute to performance planning

Carry out load testing, capacity planning, and performance monitoring

Investigate incidents, resolve issues, and improve system reliability

Work closely with developers using modern CI/CD workflows and container orchestration

This role reports into the Development Manager and can be office-based or hybrid (3 days per week onsite in Crewe).

Key Skills & Experience

Essential

Strong AWS experience - EC2, EKS, RDS, Aurora, networking, and cost optimisation

Confident with building & deploying C# / .NET applications

Experience with NuGet package management in CI/CD

Infrastructure as Code experience using CDK or Terraform

CI/CD experience with AWS CodePipeline or GitLab CI/CD

Solid Linux administration skills

Docker and Kubernetes experience

Strong understanding of TCP/IP, DNS, HTTP

Knowledge of security best practice for web application deployments

Experience with monitoring & logging tools (CloudWatch, Prometheus, Grafana)

Web application firewall experience (AWS WAF, Cloudflare)

Desirable

PostgreSQL and MSSQL administration

Azure cloud services

KongHQ / AWS API Gateway

Azure DevOps

CloudFront and other CDNs

SSL certificate management, configuration hardening, domain setup

Performance tuning and load testing

Experience with pipeline-based mobile app builds / remote Mac builders

What’s on Offer

Competitive salary and benefits package

Pension & life assurance

Employee fuel card scheme

Electric vehicle scheme

Employee assistance programme

Wellness and healthcare services

Cycle to work scheme

Free breakfast onsite

Modern purpose-built office with gym, café and bar

If you’re an experienced DevOps Engineer looking to make a real impact on scalable, cloud-driven platforms in the EV and sustainability space, click apply now

Infrastructure Engineer
Experis
London
Hybrid
Mid
ÂŁ400/day - ÂŁ460/day
+6

(Apply online only) per day - Umbrella only
Dates: 23/02/2026 - 31/07/2026
Mostly remote but will need to be flexible to travel to London, Manchester and Leicester as required

  • An infrastructure operations engineer is responsible for the preparation and support of IT operations solutions and services, physical or virtual.
  • They do so according to industry and organisational best practices, standards, service requirements and Key Performance Indicators (KPIs) throughout the product life cycle.
  • Create and maintain documentation.
  • Perform all routine tasks according to process and checklists.
  • Monitor infrastructure and applications services.
  • Alert and take appropriate action

Required Skills

Asset and configuration management.
Availability and capacity management.
Change Management.
Coding and scripting.
Continual service improvement.
Incident management.
Problem management.
Service management framework knowledge.
Testing.
Technical understanding

Technical skills

  • Cloud Platforms: Azure / AWS / GCP (compute, networking, IAM, storage)
  • OS & Virtualisation: Linux (RHEL/Ubuntu), Windows Server, VMware / Hyper?V
  • Automation & IaC: Terraform, Ansible, PowerShell, Bash
  • CI/CD & DevOps Tooling: Azure DevOps / GitHub Actions / Jenkins
  • Monitoring & Observability: Prometheus, Grafana, ELK/EFK, Azure Monitor
  • Networking & Security: Firewalls, Load Balancers, VPNs, Zero Trust, Identity & Access Management

All profiles will be reviewed against the required skills and experience. Due to the high number of applications we will only be able to respond to successful applicants in the first instance. We thank you for your interest and the time taken to apply!

Back End Software Engineer
IO
Romsey
Hybrid
Mid - Senior
ÂŁ45,000 - ÂŁ65,000
+6

Software Engineer (Backend) Hybrid/Romsey Up to £65,000 per annum (DOE) + Generous Benefits package iO Associates has partnered with a R&D client specialising In Defence, and National Security, who is looking for a Backend Software Engineer to build high-speed, mission-critical systems for our National Security. You'll design, develop, test, and deploy modern backend services that connect physical and digital systems, working with Go, Kubernetes, and cloud‑native tooling to deliver reliable analytics pipelines. What they're looking for: Building high‑performance backend services in Go. Shipping containerised apps to Kubernetes with Helm and ArgoCD. Turning concepts into robust technical designs. Shaping work in an Agile team through epics, stories, and sprint rituals. Creating clear, concise technical documentation. Monitoring live systems and fixing issues before they become problems.Skillset required: Strong experience with Go or Python. Solid DevOps grounding: Docker, ArgoCD, GitLab CI. Real‑world Kubernetes deployment experience with Helm. A modern engineering mindset and familiarity with Agile delivery.Additional skillset that was would advantageous for the role: Redis Robot Framework Prometheus / Thanos / Grafana NATS, Qpid, Kafka Linux networking AWSDue to the nature of the work, the client requires you to be eligible to obtain a Security Clearance, with the view of obtaining DV in the future. If this role is of interest, apply to the link for consideration

Kubernetes Engineer (Kubernetes, Jenkins, IAC)
GCS
Sheffield
Hybrid
Mid - Senior
ÂŁ550/day - ÂŁ650/day
+7

My client is looking for a Kubernetes Specialist to work 3 days onsite in Sheffield (Candidate can be based in Leeds to work 3 days a week in office in Leeds).

Job Description

Top Skils- Kubernetes - 5+ years experience

Design, deploy, and manages containerized applications within Kubernetes clusters, ensuring high availability, security, and scalability. Key requirements include proficiency in Kubernetes architecture, Linux networking, CI/CD pipelines, IaC (Terraform), and scripting (Go/Python). The role involves optimizing cluster performance, managing services, and supporting developer workflows. wiz.io +5

Key Responsibilities

  • Cluster Management: Deploy, scale, and maintain production-grade Kubernetes clusters.
  • Infrastructure as Code (IaC): Automate environment provisioning using Terraform or Ansible.
  • CI/CD & GitOps: Build and manage pipelines using tools like Jenkins, GitLab CI/CD, Argo CD, or Flux.
  • Networking & Security: Configure networking (Ingress, CNI, service mesh) and implement security best practices (RBAC, network policies).
  • Monitoring & Observability: Implement logging and monitoring solutions (Prometheus, Grafana, Fluentd).
  • Troubleshooting: Diagnose and resolve cluster issues related to networking, storage, and application performance

Required Skills and Qualifications

  • Kubernetes Expertise: Deep knowledge of Kubernetes components, architecture, and APIs.
  • Containerization: Proficiency with Docker and container runtimes.
  • Cloud Experience: Strong experience with public cloud providers (AWS, Azure, or GCP).
  • Scripting/Coding: Ability to write automation scripts in Bash, Python, or Go.
  • Linux Knowledge: Strong understanding of Linux system administration.
  • Certifications: Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) preferred.

Ideal Candidate Profile

  • 5+ years of experience managing Kubernetes in production.
  • Experience with GitOps workflows and managing infrastructure at scale.
  • Excellent problem-solving skills and a focus on automation.

GCS is acting as an Employment Business in relation to this vacancy.

System Monitoring & Observability Engineer (Prometheus / Grafana)
SRT Marine Systems PLC
Cardiff
In office
Mid - Senior
ÂŁ350/day - ÂŁ500/day

SRT Marine Systems plc (SRT) is a market leader in the domain of international marine surveillance technology and systems. We are a respected, established, and an ambitious multi-national company headquartered in the UK with a global customer base.

The company has a worldwide impact in the marine sector by leading the next generation of maritime domain awareness technologies “MDA”, products, and systems that significantly enhance security, safety, environmental protection, and sustainability. Our customers are global and range from the largest national coast guards to individual vessel owners.

SRT is an exciting company where high-quality results are rewarded. We are ambitious and constantly seek to innovate in order to deliver better products and services to our customers. We strive to make SRT a rewarding and challenging place to work, where talented, hard-working individuals have the opportunity to make a real impact across the marine industry.

Role overview of our System Monitoring & Observability Engineer (Prometheus / Grafana)

You as a System Monitoring & Observability Engineer (Prometheus / Grafana) here at SRT, you will be part of a small team tasked with implementing an end-user observability visualisation. Currently, we have observability dashboards in place for our engineers, utilising Prometheus for metrics collection and Grafana for visualisation. This initiative aims to deliver a more user-friendly solution tailored for our end-users.

Our clients are located across various countries worldwide, each with differing WAN capabilities, and our system is geographically distributed on-premises across multiple sites. We are fortunate to have a team of highly experiencedengineers, including UX designers, who can provide support and guidance. Our lead observability engineer will oversee and assist with your work throughout the project in the role of System Monitoring & Observability Engineer (Prometheus / Grafana).

Key Responsibilities - System Monitoring & Observability Engineer (Prometheus / Grafana) - (not exhaustive)

Monitoring & Metrics Collection
Design, configure, and maintain Prometheus-based monitoring solutions
Develop and manage metric exporters for application and system-level data
Optimise Prometheus scraping configurations and retention policies
Alerting & Incident Response
Define and maintain alert rules based on SLIs/SLOs and performance baselines
Ensure alerts are actionable, with minimal false positives
Participate (not necessarily lead) in on-call rotations and incident postmortems
Observability Dashboards
Design and maintain Grafana dashboards for real-time operational insights
Collaborate with engineering and product teams to create tailored visualisations
Provide self-service dashboard capabilities for end users
System Performance & Reliability
Monitor infrastructure (servers, containers, databases, services) for uptime, latency, and throughput
Identify bottlenecks and recommend improvements

Required Skills & Experience - System Monitoring & Observability Engineer (Prometheus / Grafana)

Proven experience with Prometheus (including PromQL) and Grafana in production environments
Strong knowledge of Linux-based systems
Experience writing and optimising PromQL queries for alerts and dashboards
Familiarity with exporters (node_exporter, blackbox_exporter, custom exporters)
Understanding of alertmanager configuration and routing
Proficiency with Grafana dashboard creation and templating
Strong troubleshooting skills for infrastructure and application issues
Familiarity with containers (Docker)
Scripting skills (Bash, Python, or Go) for automation

Just some of the benefits we offer

Highly Competitive Salary
Matched company pension contributions up to 5%
25 days annual leave rising to 28 days with service
Career development opportunities
Company “Get to know you” days

SRT Marine Systems plc are an equal opportunity employer. We are committed to creating an inclusive working environment for all employees and actively encourage applications from all sectors of the community

Frequently asked questions
Our job board features a variety of Prometheus roles including Monitoring Engineer, DevOps Engineer, Site Reliability Engineer (SRE), and Cloud Infrastructure Specialist positions that require Prometheus expertise.
Commonly required skills include proficiency with Prometheus for metrics collection and monitoring, experience with Grafana for dashboards, knowledge of alerting rules, familiarity with Kubernetes and cloud platforms, and strong Linux system administration skills.
Yes, you can filter job listings by experience level such as entry-level, mid-level, and senior roles to find Prometheus positions that match your career stage.
Absolutely. Many companies post remote or hybrid Prometheus jobs on our platform. You can easily filter your search results to find remote opportunities.
New Prometheus job listings are added daily as companies continuously seek monitoring and DevOps professionals skilled with Prometheus.