Remote Site Reliability Engineer (SRE) Jobs

Make yourself visible and let companies apply to you.

Roles

Overview

Looking for remote Site Reliability Engineer (SRE) jobs? Discover top SRE opportunities with flexible remote work options on Haystack. Find your next role ensuring system reliability, scalability, and performance—all from anywhere. Start your remote SRE career today!

Permanent

Location: UK- Remote

Salary: £70,000 - £75,000 (+ benefits)

Skills: AWS, Terraform, CI/CD, Production SaaS experience

We are looking to recruit a Lead DevOps Engineer for a leading software company. This is a hands-on technical leadership role, ideal for someone who enjoys owning AWS infrastructure strategy while remaining close to engineering delivery.

You’ll play a key role in shaping platform standards, improving reliability, embedding security best practice, and driving automation across the organisation.

This is a fully remote UK based role.

The Role

Platform Architecture & Cloud Engineering

Own AWS multi-account infrastructure architecture (secure-by-design)

Define infrastructure standards across networking, IAM, logging and disaster recovery

Lead Infrastructure-as-Code strategy (Terraform preferred)

Ensure scalability, resilience and high availability across production environments

CI/CD & Release Automation

Design and optimise CI/CD pipelines

Improve deployment reliability and reduce rollback frequency

Standardise release processes across engineering teams

Implement progressive delivery practices

Reliability & Observability

Define and track SLIs/SLOs

Enhance monitoring, alerting and incident response processes

Lead post-incident reviews and root cause analysis

Drive reduction of operational toil

Security & Compliance

Embed DevSecOps controls into pipelines

Implement least-privilege IAM models

Support ISO 27001 and compliance evidence automation

FinOps & Cost Optimisation

Partner on cloud cost optimisation strategy

Improve tagging standards and cost allocation models

Implement rightsizing and automation policies

About You

5+ years’ experience in DevOps / Cloud Engineering

Strong AWS expertise (VPC, IAM, EC2, RDS, EKS, Lambda)

Proven Infrastructure-as-Code experience (Terraform preferred)

CI/CD tooling experience (GitHub Actions, GitLab CI, Jenkins)

Experience operating production SaaS environments

Strong observability tooling knowledge (Datadog, Prometheus, ELK etc.)

Incident management and root cause analysis experience

Experience in regulated or security-conscious environments is highly desirable

TPBN1_UKTJ

About Scrumconnect Consulting:
Scrumconnect Consulting is a multi-award-winning digital consultancy, recognised for delivering impactful technology solutions across UK government departments. Our work has positively influenced the lives of over 40 million UK citizens. With a strong commitment to user-centred design and agile delivery, and more to deliver innovative digital services that matter

Role Description: As a DevOps Engineer, you will support the automation, reliability and security of Google Cloud Platform environments. You will implement Infrastructure as Code using Terraform and embed CI/CD pipelines through GitHub and associated tooling. You will apply Site Reliability Engineering principles to ensure system stability, observability and resilience across distributed services including data platforms and Pega-based applications. Working within agile teams, you will support secure software development life cycle practices and ensure alignment with NCSC guidance and public sector security requirements.

Preferred Tech Stack Expertise

Google Cloud Platform, Terraform, GitHub and CI/CD pipelines, Cloud Monitoring and Cloud Logging, Security Command Center, Kubernetes or Cloud Run, PostgreSQL

Responsibilities

Develop and maintain Infrastructure as Code templates for automated, repeatable deployments
Implement CI/CD pipelines to support secure and efficient application delivery
Embed monitoring, logging and alerting to improve system observability
Support vulnerability management and security compliance activities
Collaborate with development teams to embed DevOps and SRE best practices
Diagnose and resolve production issues to maintain operational continuity
Contribute to documentation and structured knowledge transfer activities

Diversity & Inclusion
At Scrumconnect Consulting, we believe that diversity drives innovation. We are committed to creating an inclusive environment where every individual is respected, valued, and supported. We welcome applications from candidates of all backgrounds and experiences, and we actively encourage applications from women, people with disabilities, under-represented communities, and those seeking flexible working arrangements.

Backend Software Engineer

Sanderson Government and Defence

You will play a key role in designing, developing, and supporting high-performance software solutions within secure and mission-critical environments. You’ll work closely with multidisciplinary teams to deliver reliable, scalable, and maintainable systems that meet demanding operational requirements. This role will involve contributing across the full development lifecycle, from early design and implementation through to deployment and long-term support, using modern cloud-native and DevOps practices.

Key Responsibilities

Design, develop, test, and maintain secure, high-performance backend services using modern programming languages.
Write clean, efficient, and maintainable code with a strong focus on reliability, performance, and security.
Translate solution architectures and business requirements into detailed technical designs and implementations.
Build, deploy, and manage containerised applications on Kubernetes using Helm and continuous deployment tools.
Support Agile delivery by contributing to sprint planning, backlog refinement, and user story development.
Produce clear, accurate, and high-quality technical documentation in line with agreed standards.
Participate actively in Agile ceremonies, including stand-ups, planning sessions, reviews, and retrospectives.
Monitor live systems, investigate performance or reliability issues, and implement fixes and enhancements.
Collaborate with DevOps and platform teams to improve automation, testing, and deployment pipelines.
Contribute to prototyping, experimentation, and the development of innovative technical solutions.
Ensure compliance with organisational processes, quality standards, and security requirements.
Support continuous improvement through code reviews, knowledge sharing, and adoption of new technologies.

Reasonable Adjustments:

Respect and equality are core values to us. We are proud of the diverse and inclusive community we have built, and we welcome applications from people of all backgrounds and perspectives. Our success is driven by our people, united by the spirit of partnership to deliver the best resourcing solutions for our clients.

If you need any help or adjustments during the recruitment process for any reason, please let us know when you apply or talk to the recruiters directly so we can support you.

Huawei RAN Engineering Specialist

Randstad Technologies

RAN Engineering Specialist

We are looking for a RAN Engineering Specialist to act as a system engineering lead for 2G and 4G networks. In this role, you will be the primary technical expert for complex fault investigations and network resilience.

As part of a high-performing Performance & Optimization team, you will directly impact network efficiency and the subscriber experience, ensuring our infrastructure remains a global leader in connectivity.

What You’ll Be Doing

Fault Leadership: Lead complex fault investigations on 2G/4G RAN sites, cells, and BSC issues.
Configuration & Scripting: Create and implement configuration scripts and commissioning files using Huawei platforms and WIM Unison.
Disaster Recovery: Design, manage, and regularly test BSC disaster recovery processes to ensure network stability.
Network Evolution: Execute overnight script implementations (such as BSC reparenting) and conduct lab-based testing.

The Skills You’ll Need

Deep Technical Expertise: Extensive knowledge of Huawei 2G/4G RAN and BSC architectures.
Platform Proficiency: Proven ability to write and implement scripts on Huawei MAE and WIM Unison platforms.
Industry Experience: A background working for a mobile network operator, vendor, or managed service supplier.
Resilience: Ability to make decisive outcomes under pressure to benefit the customer and maintain network uptime.

Role Details

Location: Flexible (Any Core Site).
Type: Full-Time (37.5 hours per week).

Randstad Technologies is acting as an Employment Business in relation to this vacancy.

Huawei RAN Engineering Specialist

Randstad Technologies Recruitment

RAN Engineering Specialist

What You’ll Be Doing

Fault Leadership: Lead complex fault investigations on 2G/4G RAN sites, cells, and BSC issues.
Configuration & Scripting: Create and implement configuration scripts and commissioning files using Huawei platforms and WIM Unison.
Disaster Recovery: Design, manage, and regularly test BSC disaster recovery processes to ensure network stability.
Network Evolution: Execute overnight script implementations (such as BSC reparenting) and conduct lab-based testing.

The Skills You’ll Need

Deep Technical Expertise: Extensive knowledge of Huawei 2G/4G RAN and BSC architectures.
Platform Proficiency: Proven ability to write and implement scripts on Huawei MAE and WIM Unison platforms.
Industry Experience: A background working for a mobile network operator, vendor, or managed service supplier.
Resilience: Ability to make decisive outcomes under pressure to benefit the customer and maintain network uptime.

Role Details

Location: Flexible (Any Core Site).
Type: Full-Time (37.5 hours per week).

Randstad Technologies is acting as an Employment Business in relation to this vacancy.

OpenShift Telemetry Engineer

The Role

We are seeking a skilled OpenShift Telemetry Engineer to join our team. In this role, you will be responsible for implementing, managing, and optimizing the observability stack within a Red Hat OpenShift Container Platform environment to ensure system health, performance, and security.

You will act as a bridge between application monitoring and infrastructure observability, leveraging modern telemetry and data streaming tools.

Key Responsibilities

Design, implement, and maintain data pipelines to ingest and process OpenShift telemetry data (metrics, logs, and traces) at scale.
Stream OpenShift telemetry through Kafka (producers, topics, schemas) and build resilient consumer services for transformation and enrichment.
Engineer data models and routing mechanisms for multi-tenant observability while ensuring data lineage, quality, and SLA adherence across streaming layers.
Integrate processed telemetry into Splunk for dashboards, visualization, alerting, and analytics to achieve Observability Level 4 (proactive insights).
Implement schema management, governance, and versioning using Avro or Protobuf for telemetry events.
Build automated validation, replay, and backfill mechanisms to ensure data reliability and recovery.
Instrument services with OpenTelemetry, standardizing tracing, metrics, and structured logging across platforms.
Utilize LLM-based capabilities to enhance observability (e.g., query assistance, anomaly summarization, runbook generation).
Collaborate with Platform, SRE, and Application teams to integrate telemetry, alerts, and SLOs.
Ensure security, compliance, and best practices for telemetry data pipelines and observability platforms.
Document data flows, schemas, dashboards, and operational runbooks.

Required Skills & Experience

Hands-on experience building streaming data pipelines with Kafka (producers/consumers, schema registry, Kafka Connect, KSQL, Kafka Streams).
Strong experience with OpenShift / Kubernetes telemetry, including OpenTelemetry and Prometheus.
Experience integrating telemetry into Splunk (HEC, Universal Forwarder, source types, CIM) and building dashboards and alerts.
Strong data engineering skills using Python (or similar languages) for ETL/ELT, enrichment, and validation.
Experience with event schemas (Avro, Protobuf, JSON) and schema compatibility strategies.
Familiarity with observability frameworks and maturity models, driving toward Level 4 observability (proactive monitoring and automated insights).
Understanding of hybrid cloud and multi-cluster telemetry architectures.

Preferred Skills:

Security and compliance practices for data pipelines, including:
- Secret management
- RBAC
- Encryption in transit and at rest
Strong problem-solving and analytical skills.
Ability to work effectively in cross-functional teams.
Excellent communication and documentation skills.

DevOps Engineer - AWS / Azure

+31

Job Title: DevOps Engineer

Location: Remote, UK

Salary: Circa £47,000 per annum, depending on skills and experience

Job Type: Full Time, Permanent

Working Hours: 37.5 hours per week to cover core business hours (9-5, Mon-Fri)

Working for Affinity:

We understand the importance of flexibility, wellness, performance and satisfaction - it is part of our culture

We offer opportunities to take unpaid leave and will give you one extra day’s holiday for every year you’ve worked with us, for up to 8 years

We know working from home introduces opportunities for you to do more domestic chores during the day, e.g. picking up the children from school, taking pets to the vet, etc. We don’t mind this at all, as long as we are aware of what you are doing and the work gets done!

About the Role:

We are looking for someone who is passionate about technology and who is always looking for opportunities to improve the services we provide for clients, in areas such as efficiency, cost effectiveness, security and reliability.

You’ll be pivotal in knowledge sharing internally within the business, assisting others in their assigned projects and, where necessary, supporting more junior members of Affinity to help them learn and improve.

The right DevOps engineer will have the opportunity to help grow and shape this emerging function within the team, creating a long-term career path for themselves within Affinity, with an opportunity to progress upwards.

What will your typical day look like?

You will be working on projects, such as the Cabinet Office WordPress support contract, to keep our clients’ systems up-to-date, available, resilient, secure, performing well and cost effective.

Working to improve the efficiency of our infrastructure related processes

At times you’ll need to work together with clients’ own internal teams and with any external infrastructure teams that Affinity may partner with.

Putting together infrastructure design documents at the start of projects, with a view to getting these signed off by the client and then acting as the basis for infrastructure build.

Building infrastructure on new projects through the use of Terraform, AWS CloudFormation, etc. This will also include continuous integration processes, server provisioning, etc.

Imparting knowledge and experience to other Affinity team members, both verbally and also captured in tools such as Jira Confluence, where we would like to build up a repository of DevOps how-tos, best practices, etc.

Opportunity to work as a team lead where other Affinity team members are involved in DevOps.

Providing AWS and Azure thought leadership and mentoring in both advisory and delivery contexts.

Supporting out-of-hours rotas to provide support coverage for contracts in place with Affinity.

About you:

General requirements:

Willingness to provide on-call rota coverage of emergency support (24/7/365)

Ability to work flexible hours from time to time, for specific projects or tasks

UK based and resident in UK for last 3 years (given some client-driven security clearance requirements)

The role is fully remote, though visits to clients and to our office in Cornwall will occasionally be required

Technical must-haves:

Commercial experience of AWS, including services such as Amazon VPC, Amazon RDS, Amazon ElastiCache, Amazon EC2, Amazon ECS/EKS, Amazon EFS, AWS IAM, Amazon CloudFront, Amazon S3, AWS CodePipeline, Amazon GuardDuty, AWS Security Hub, AWS Cost Explorer, etc.

Commercial experience of Azure, including services such as Azure Networking, Azure Cache (Redis and Memcached), App Services (running WordPress and .NET applications), Front Doors (CDN), API Services, storage services (blob storage, file storage, etc.), Azure Database services, Azure Cost Management, etc.

Terraform and/or CloudFormation scripting

Linux, Apache/Nginx

Continuous integration/deployment (CI/CD) experience including GitHub Actions

Familiarity with the AWS/Azure Well-Architected Framework and NCSC Cloud Security Principles

Security experience, including the resolution of issues found during penetration testing

Docker experience

Technical nice-to-haves:

System/server admin experience

Drupal, WordPress or Magento experience

PHP and Composer

Microsoft technologies, including .NET, Windows Server, IIS, Active Directory, MSSQL, etc.

AWS certifications (AWS Certified Solutions Architect - Professional, DevOps Engineer - Professional, speciality certifications, e.g., Database, Security, etc.)

Microsoft certifications, including Azure Fundamentals, Azure Administrator Associate, Azure DevOps Engineer Expert, Azure Security Engineer Associate, etc.

Load testing experience, including jMeter, Gatling, K6, etc.

Other HashiCorp tools, e.g., Packer, Vault, Vagrant, Consul, etc.

Provisioning tools, e.g., Puppet, Ansible, Chef, etc.

AWS Control Tower and/or Landing Zone

Experience with Google Cloud Platform

Experience with Azure Resource Manager (ARM) Templates

Please click the APPLY button to submit your CV for this role.

Candidates with the experience or relevant job titles of; Software Developer, Software Engineer, Infrastructure Engineer, AWS, Infrastructure Engineer, AWS Systems Developer, Azure Software Development may also be considered for this role.

TPBN1_UKTJ

Site Reliability Engineer / SRE / Systems Engineer

A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience.

If you’ve also worked in the following roles, we’d also like to hear from you: DevOps Engineer, Operations Engineer, Cloud Engineer, Platform Engineer, Systems Engineer, Infrastructure Engineer, Production Engineer

SALARY: up to £70,000 per annum (depending on experience) + Benefits

LOCATION: Remote and Hybrid Working Options Available. You can either work remotely of if you prefer Hybrid working from home and the office in Altrincham, Greater Manchester, North West England

JOB TYPE: Full-Time, Permanent

JOB OVERVIEW

We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services.

As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments.

This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.

APPLY TODAY

Ready to make your next career move? Apply Now for our Recruitment Team to review.

DUTIES

Your duties as the Site Reliability Engineer / Systems Engineer include:

Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover
System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services
Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues
Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience
Automation and Resilience: Supporting automation, incident response and continuous improvement practices
New Service Support: Ensuring new products and features are operable, reliable and scalable from day one
Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues
Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports
Incident Prioritisation: Balancing customer impact with long-term system health and stability
Security and Compliance: Supporting compliance with security, availability and regulatory frameworks

CANDIDATE REQUIREMENTS

ESSENTIAL

Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role
Experience supporting production services at scale within a DevOps or SRE environment
Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6
Experience with observability tools such as Prometheus, Grafana, ELK or Splunk
Hands-on experience with containerisation and orchestration using Docker and Kubernetes
Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices
Strong Linux administration skills with scripting capability in Bash, Python or similar
Familiarity with CI/CD pipelines and source control tools such as GitHub Actions
Understanding of security frameworks and operational resilience best practices

DESIRABLE

Experience within ISP, MSP or telecommunications environments
Familiarity with enterprise IT architectures including OSS and BSS systems
Knowledge of information security frameworks such as ISO27001, NIST or GDPR
Experience with infrastructure automation tools such as Terraform or Ansible

BENEFITS

Smart casual dress code
Free access to gym facilities
Access to a financial wellbeing platform (on successful completion of probationary period)
Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period)
Access to cycle to work, childcare, and electric vehicle schemes after six months
Brand new office with excellent transport links
Supportive team culture, growth and career progression

HOW TO APPLY

To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CV’s of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose.

JOB REF: AWDO-P14376

Full-Time, Permanent Jobs, Careers and Vacancies. Find a new job and work in Altrincham, Greater Manchester, North West England. Multi-Job Board Advertising and CV Sourcing Recruitment Services provided by AWD online.

AWD online specialise in sourcing candidates and advertising vacancies on multiple job boards for companies on a non-commission basis. AWD online operates as an employment agency.

awd online

Platform Engineer - Active SC, Databricks, Trivy, Azure DevOps

Up to £510 per day - Inside IR35

Remote

6 months

My client is an instantly recognisable consultancy who urgently require a Platform Engineer with Active SC Clearance for an end client within the public sector.

Key Requirements:

Proven commercial experience working as a Platform / DevOps Engineer within the public sector.
Active SC Clearance.
Strong, commercial experience with Terraform for IaC, and with Databricks.
Proven track record configuring and managing Azure DevOps CI/CD pipelines.
Deep understanding of Azure cloud services and components.
Practical experience with Docker containerisation.
Knowledge of security scanning tooling (Trivy or similar).
Scripting proficiency in Bash (Python is desirable).
Solid understanding of Git-based version control, specifically within Azure DevOps.
Nice to have:

Immediate availability.Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C’s, Privacy Policy and Disclaimers which can be found at (url removed)

Senior Platform Engineer

Spectrum It Recruitment Limited

We are looking for a Senior Platform Engineer to join a growing platform team supporting a large-scale SaaS platform. This role focuses on improving reliability, scalability, and performance while helping drive a major cloud-native transition to Azure and Kubernetes.

You will work closely with engineering teams to modernise infrastructure, automate operations, and ensure highly observable and resilient systems in a production environment handling sensitive data.

Key Responsibilities

Design and deliver cloud migration and containerisation initiatives
Operate and improve production monitoring, alerting, and system reliability
Automate infrastructure provisioning using Pulumi and Ansible
Participate in incident response and on-call rotations
Lead blameless post-mortems and continuous improvement efforts
Collaborate with product engineering teams on observability and performance

Key Skills

Strong experience with Azure cloud architecture
Hands-on Kubernetes experience
Pulumi experience
Experience troubleshooting distributed systems
Monitoring and observability tools such as Prometheus, Grafana, or Graylog
Linux systems administration
Scripting with PowerShell, Bash, or Python
Experience with SQL Server, PostgreSQL, or Redis
Strong documentation and communication skills

Desirable

Software development background (C#, Go, TypeScript, Python)
Experience with Ansible
Go development experience
Experience in regulated environments
Familiarity with Windows / IIS environments

Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

SC DevOps Engineer

Sanderson Recruitment

DevOps Engineer (Active SC)

£504.49/day Inside IR-35

Remote with occasional travel to London, Manchester or Leicester

6 month initial contract

We are looking for a DevOps Engineer to work closely with developers and IT teams to manage and optimise code releases, bringing together strong engineering principles with practical coding knowledge. You will play a key role in creating and implementing systems software, analysing data to enhance existing solutions, and driving productivity across the organisation. A solid understanding of the software development lifecycle is essential, along with hands-on experience using automation tools to build and maintain CI/CD pipelines.

Mandatory experience and skills include:

* Proven ability to learn quickly and apply new technical skills
* Hands-on experience delivering technical projects within the DevOps domain
* Experience working in an integration-focused scrum team, supported by architects and subject matter experts
* Ability to collaborate effectively with client teams remotely, supporting successful integration with external applications
* Experience with WebSphere, MQ, IBM BPM or other rules-based application technologies
* Familiarity with JIRA and Confluence
* Knowledge of scripting languages such as ANT, Ansible, Bash, or Terraform
* Creating and maintaining CI/CD pipelines
* Experience with Jenkins, AWS, Git, Docker, and Liquibase

Reasonable Adjustments:

If you need any help or adjustments during the recruitment process for any reason, please let us know when you apply or talk to the recruiters directly so we can support you.

Backend Software Engineer - Bazel Java

Senior Software Engineer - Bazel / Java We are seeking a collaborative and curious backend Engineer to help drive and develop the next generation of developer infrastructure and tooling as we establish a unified, robust and scalable monorepo ecosystem for all engineers at Spotify.. This role helps to support VCS and CI systems in addition to a Fleet Management product for helping developers at our client manage large scale software changes. In the last year, we have invested a huge amount of time and effort into the next steps of that and our migration towards monorepos. The role is located in our Platform Developer Experience (PDX) R&D Studio. The PDX R&D Studio oversees and owns cross discipline infrastructure that cuts across all engineering at our clent including our VCS, CI systems and beyond. YOU MUST HAVE EXPERIENCE WITH BAZEL TO BE CONSIDERED FOR THIS ROLE Role Duties Bring your experience and knowledge of working with Bazel and scaling monorepos to millions of lines of code to help us rethink the future of backend engineering at Spotify Help us migrate to, in addition to own and maintain Bazel and any related abstractions built to improve the developer experience Collaborate with our adjacent infrastructure teams across the company to develop what a best in class monorepo experience means Essential Skills Strong passion for making developers highly productive Experience developing and maintaining tools for large monorepo-based codebases Excellent problem solving skills Experience working with the Bazel build system and its ecosystem (e.g: rulesets such as rules*jvm*external, IntelliJ Bazel plugin, etc.) Fluency in Java, Python, Starlark and TypeScript This contract role can be worked fully renmotely but you must be based in the UK. I have interview slots ready to be filled so dont delay and apply ASAP to be considered. Randstad Technologies is acting as an Employment Business in relation to this vacancy. TPBN1\_UKTJ

Lead Devops Engineer - SC Cleared

Role: Lead DevOps Engineer Rate: £675 - £715 per day (Inside IR35 - Umbrella) Location: Fully Remote (UK-Based) Duration: Initial 6 months (extensions highly likely) Clearance: Active SC Clearance (Required) The Opportunity Join a high-profile Public Sector transformation programme. We are seeking a senior-level DevOps Engineer to serve as technical leads in the design and evolution of secure, cloud-native platforms. This is a pure IaC (Infrastructure as Code) environment focused on enterprise-scale resilience and security. Core Technical Stack \* AWS Master: Deep expertise across EC2, Lambda, S3, IAM, and VPC. Terraform Expertise: Proven track record in designing and refactoring complex IaC environments (Git/GitLab integration). CI/CD Orchestration: Advanced experience with GitLab CI and Jenkins to automate end-to-end delivery pipelines. Containerization: Practical experience with Docker and Kubernetes (EKS) for scaling microservices. Scripting: High proficiency in Python, Bash, and Ansible for automation and 'code review' leadership. Responsibilities Lead Technical Design: Shape the architectural direction of the AWS environment. Refactoring & Quality: Lead code reviews and promote refactoring techniques to enhance a multi-site enterprise codebase. Security & Compliance: Implement robust security measures aligned to GDS and Home Office security standards. Stakeholder Liaison: Act as the technical bridge between IT colleagues and business clients to track prioritisation and delivery. Apply Now for immediate consideration TPBN1\_UKTJ

Machine Learning Engineer

Lynx Recruitment Limited

A leading technology consultancy is seeking an experienced Machine Learning Operations Engineer to help deploy AI/ML solutions into production environments across a range of client engagements. This role is focused on operationalising models (MLOps / AIOps), rather than building ML applications from scratch. Key Responsibilities Design and develop machine learning operational processes across diverse client environments Analyse client requirements and provide clear technical recommendations Manage the full ML lifecycle, including data selection, model deployment, operationalisation, and monitoring Develop Infrastructure as Code for ML platforms using AWS CDK Collaborate closely with data scientists to productionise ML models Required Experience & Skills Proven experience deploying AI/ML models into production (MLOps / AIOps) Strong AWS experience (other cloud platforms will not be considered) Experience with CI/CD pipelines for ML/DL models Proficiency in Python and common ML libraries (e.g. Scikit-learn, TensorFlow, PyTorch) Experience with container technologies (Docker, Kubernetes) Consultancy or client-facing experience preferred Strong communication and stakeholder management skills Qualifications IT / Technology-related degree (e.g. Computer Science, Engineering, Mathematics or similar) Minimum 2:1 classification (or equivalent) Bachelors degree or higher required TPBN1\_UKTJ

Cloud Engineer

COMPUTACENTER (UK) LIMITED

Life on the team Location: UK WIDE Be part of a 1,000-strong expert community across the UK, Germany, and France under the GPS umbrellaComputacenters leading consultancy and project delivery arm. Here, you'll grow alongside skilled peers, collaborate closely with account teams and partners, and continuously stay sharp by aligning with emerging market technologies and our strategic roadmap. As a predominately engineering role you will have proven skills implementing modern, scalable platform solutions using a range of new and emerging technologies from developed and deployed with Cloud/DevOps tooling What you'll do Engineer multi-cloud architectures across public and private environments, delivering robust, scalable, and automated solutions to support development teams Build by following best practices and Computacenter methodologies, ensuring consistency, performance, and operational excellence Serve as a Junior Subject-Matter Expert (SME) on Cloud Engineering within delivery teamsrepresenting technical implementation in client meetings and highlighting risks clearly and proactively What you'll need Team-based experience in Cloud or DevOps environments, working within Agile frameworks like Scrum, Kanban, or Lean Familiarity with CI/CD pipelines using tools such as Azure DevOps and GitHub Actions, and provisioning frameworks like ARM templates or Terraform Hands-on exposure to DevOps tooling (automation, orchestration, testing), with awareness of Jenkins or GitLab, Atlassian tools (Jira, Confluence) and security tools like HashiCorp Vault Strong communicatorable to translate technical complexity into clear insights for both technical and business audiences, while continuously learning and applying new knowledge effectively Broader exposure to cloud platforms and native deployment tools, plus certifications in GitLab, Terraform, or Vault would be a bonus (desirable) Current certifications such as AZ-104, AZ-400, or Terraform Associate reflect highly sought-after competencies (desirable) Join a certified Great Place to Work, where 81% of employees say they feel its an exceptional place to workbacked by a culture that truly values diversity, belonging, and personal growth. TPBN1\_UKTJ

DevOps Engineer - Active SC, Ansible, Terraform

Up to £500 per day (Outside IR35)

Primarily Remote

6 months

My client requires a hands‑on DevOps Engineer with active SC clearance to support the delivery and maintenance of Azure‑based IaaS/PaaS environments using Terraform and Ansible.

Key Requirements:

Proven experience as a DevOps Engineer with Active Security Clearance (SC)
Hands‑on experience designing and delivering Infrastructure as Code solutions using Terraform.
Strong track record applying Ansible for configuration management and automated environment builds.
Background implementing, supporting and optimising services within the Azure cloud platform.
Practical experience administering Windows Server estates in operational environments.
Solid exposure to Linux server management, including configuration, patching and diagnostics.
Experience resolving infrastructure, OS and network‑level issues, including components such as Active Directory, DNS and IIS.
Nice to have:

Immediate availability
Flexibility to get to site

If you’re interested in this role, click ‘apply now’ to forward an up-to-date copy of your CV, or call us now.
If this job isn’t quite right for you, but you are looking for a new position, please contact us for a confidential discussion about your career.

Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C’s, Privacy Policy and Disclaimers which can be found at (url removed)

Senior Platform Engineer

Spectrum IT Recruitment

You will work closely with engineering teams to modernise infrastructure, automate operations, and ensure highly observable and resilient systems in a production environment handling sensitive data.

Key Responsibilities

Design and deliver cloud migration and containerisation initiatives
Operate and improve production monitoring, alerting, and system reliability
Automate infrastructure provisioning using Pulumi and Ansible
Participate in incident response and on-call rotations
Lead blameless post-mortems and continuous improvement efforts
Collaborate with product engineering teams on observability and performance

Key Skills

Strong experience with Azure cloud architecture
Hands-on Kubernetes experience
Pulumi experience
Experience troubleshooting distributed systems
Monitoring and observability tools such as Prometheus, Grafana, or Graylog
Linux systems administration
Scripting with PowerShell, Bash, or Python
Experience with SQL Server, PostgreSQL, or Redis
Strong documentation and communication skills

Desirable

Software development background (C#, Go, TypeScript, Python)
Experience with Ansible
Go development experience
Experience in regulated environments
Familiarity with Windows / IIS environments

Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

Key Responsibilities

Participate in the growth, industrialization, and DevOps of the Airops Fast Data platform.* Develop event-driven data interfaces between Airops products, the data platform, and third-party applications or middleware, within an MS Azure environment.* Work within a DevOps team and participate in the maintenance of the Azure platform.* Utilize tools like Jenkins, ArgoCD, and Helm charts for continuous integration and continuous deployment (CI/CD) processes.* Work with Kubernetes and OpenShift for container orchestration and management.* Develop and maintain event streaming solutions using Kafka and Azure Event Hub.

Ideal Candidate Profile

Java Development: Previous experience as a Java developer, with expertise in Java Quarkus or Java Spring Boot (Java version 21).* Cloud Native: Experience developing in a cloud environment (MS Azure or AWS).* Event Streaming: Experience with event streaming infrastructure (e.g., Kafka, Azure Event Hub).* DevOps: Experience with DevOps infrastructure and components, including Jenkins, ArgoCD, and Helm charts.* Containerization: Proficiency in containerization and orchestration tools such as Kubernetes and OpenShift.* Software Maintenance: Experience in software maintenance, including incident resolution, root cause analysis, and post-mortem analysis.* Agile Methodologies: Experience working in Agile environments (Scrum, SAFe).

Platform Operations Engineer

Job Title: Platform Operations Engineer
Department: AI Lab - Platform Operations Team
Location: Remote
Employment Type: Full-time, Permanent

Overview
We are actively looking to secure multiple Platform Operations Engineers to join Experis, part of the ManpowerGroup - a global organisation with over $20?billion in annual revenue and more than 1,000 consultants on assignment across 20 clients worldwide.
Experis UK is in an exciting growth phase, with ambitious expansion plans and deep partnerships across multiple industries. Our model is personal and career-focused: we invest in our consultants through continuous training, technology exposure, and collaborative development.

About the Role
IBM’s AI Lab is building next-generation AI platforms and services. To support this mission, we’re growing our Platform Operations Team, responsible for the cloud infrastructure that powers our AI services. As a Platform Operations Engineer, you’ll work across AWS, Kubernetes, and internal automation tools to ensure the platform runs smoothly, securely, and efficiently.
This role suits someone who enjoys working at the intersection of software development and operations - writing code, automating infrastructure, and supporting high-performance machine learning environments.

Key Responsibilities

Deploy, manage, and monitor applications on AWS EKS (Kubernetes)
Build and maintain Helm charts, manifests, and ArgoCD configurations
Contribute Python code for internal tooling, automation, and services
Manage CI/CD pipelines (e.g. Concourse, GitLab CI)
Troubleshoot issues in networking, permissions, and application performance
Work with development teams to streamline deployment and scaling of AI systems
Maintain secure cloud environments through thoughtful IAM and Terraform configurations

Essential Skills and Experience

Strong hands-on experience with Kubernetes (deployment, debugging, Helm)
Intermediate to advanced Python development skills
Familiarity with CI/CD pipelines, especially writing and debugging them
Solid understanding of AWS services (EKS, IAM, S3)
Confident with Linux-based environments and containerization (Docker)

Ideal (Bonus) Skills

Experience with Helm, ArgoCD, and GitOps workflows
Practical knowledge of Terraform for infrastructure-as-code
Understanding of Kubernetes networking, ingress management, and certificate handling
Exposure to OAuth/OpenID, certificates, and authentication proxies

Splunk and OpenShift Observability Engineer

CBSbutler Holdings Limited trading as CBSbutler

We’re looking for a Splunk & OpenShift Observability Engineer to design, deploy, and optimise enterprise-grade monitoring across hybrid Kubernetes and OpenShift environments.

This is a high-impact role where you’ll shape observability strategy, enhance service intelligence, and ensure platform reliability at scale - balancing performance, cost efficiency, and security governance.

You’ll work at the intersection of platform engineering, observability, and service intelligence, helping to transform raw telemetry into actionable insight. This is an opportunity to influence reliability strategy, improve operational maturity, and deliver measurable value across a modern cloud-native estate.

What You’ll Be Doing

Design, deploy, and operate Splunk Enterprise and ITSI across hybrid Kubernetes/OpenShift platforms
Onboard and normalise data at scale (HEC, Universal Forwarder, Deployment Server), aligning to CIM standards
Build and optimise ITSI service models: service trees, KPIs, adaptive thresholds, NEAP policies, glass tables, deep dives, and health scoring
Deliver OpenShift-focused executive and operational dashboards, including:
Cluster/API/etcd health
Node readiness and resource pressure
Pod restart trends and noisy-neighbour detection
Network and storage error visibility
Capacity, quota, and burst analysis
Optimise search and platform performance (workload rules, DMA, summary indexing, scheduling hygiene, concurrency tuning)
Implement intelligent alerting and automated routing into ITSM and ChatOps platforms, including enrichment, suppression windows, and maintenance scheduling
Govern data ingestion and security controls (RBAC, retention, PII handling, TLS, token governance, index and role mapping)
Integrate telemetry pipelines including OpenTelemetry, Prometheus, Fluentd/Fluent Bit/Vector, Kafka, CMDB and AIOps/ML solutions
Drive SLO/KPI alignment, golden signal monitoring, rollout/rollback health validation, and executive reporting

What You’ll Bring

Deep expertise in Splunk Enterprise (SPL mastery, CIM alignment, saved searches, macros, KV stores, index/retention/RBAC design, performance tuning)
Strong experience with Splunk ITSI (service trees, KPIs, adaptive/time-based thresholds, NEAP tuning, Service Analyzer configuration)
Proven OpenShift/Kubernetes observability experience across control-plane metrics, events, logs, workload correlation, and capacity management
Hands-on experience with telemetry pipelines (OpenTelemetry/OTLP, Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka with TLS, HEC/UF/DS onboarding)
Strong understanding of reliability engineering principles (golden signals, SLO design, namespace/application KPI mapping)
Experience optimising performance and licensing costs using workload rules, DMA, and summary indexing
Solid security and compliance knowledge (TLS/mTLS, certificate/token hygiene, PII controls, auditability, role/index mapping)
Automation and integration expertise across ITSM, ChatOps, webhooks, CMDB enrichment, and AIOps tooling

Backend Software Engineer

Key Responsibilities

Design, develop, test, and maintain secure, high-performance backend services using modern programming languages.
Write clean, efficient, and maintainable code with a strong focus on reliability, performance, and security.
Translate solution architectures and business requirements into detailed technical designs and implementations.
Build, deploy, and manage containerised applications on Kubernetes using Helm and continuous deployment tools.
Support Agile delivery by contributing to sprint planning, backlog refinement, and user story development.
Produce clear, accurate, and high-quality technical documentation in line with agreed standards.
Participate actively in Agile ceremonies, including stand-ups, planning sessions, reviews, and retrospectives.
Monitor live systems, investigate performance or reliability issues, and implement fixes and enhancements.
Collaborate with DevOps and platform teams to improve automation, testing, and deployment pipelines.
Contribute to prototyping, experimentation, and the development of innovative technical solutions.
Ensure compliance with organisational processes, quality standards, and security requirements.
Support continuous improvement through code reviews, knowledge sharing, and adoption of new technologies.

Reasonable Adjustments:

If you need any help or adjustments during the recruitment process for any reason, please let us know when you apply or talk to the recruiters directly so we can support you.

Page 2 of 4

Frequently asked questions

A Site Reliability Engineer (SRE) is an IT professional who applies software engineering principles to ensure scalable and reliable operation of software systems, bridging the gap between development and operations teams.

Our job board specializes in remote Site Reliability Engineer positions, but availability can vary. Be sure to check each job listing for remote work details to confirm if the position is fully remote or requires occasional on-site presence.

Common skills for remote SRE roles include proficiency in cloud platforms like AWS, Azure, or GCP, experience with containerization and orchestration tools such as Kubernetes, strong coding abilities in languages like Python or Go, and expertise in monitoring and incident response.

To apply, simply create an account on our platform, upload your resume, and click the 'Apply' button on the job listing that interests you. Some listings may redirect you to the employer's application portal.

Yes! You can create customized job alerts by specifying keywords, locations, and job types such as remote. We'll notify you via email as soon as matching SRE roles become available.