Remote Kubernetes Jobs

Make yourself visible and let companies apply to you.

Roles

Overview

Looking for top remote Kubernetes jobs? Explore the best opportunities in cloud-native infrastructure and container orchestration with our specialized job board. Whether you're a Kubernetes engineer, DevOps professional, or cloud architect, find remote roles that match your skills and advance your career from anywhere. Start your search for remote Kubernetes jobs today and join leading companies embracing flexible, distributed teams.

Backend Software Engineer

Sanderson Government and Defence

You will play a key role in designing, developing, and supporting high-performance software solutions within secure and mission-critical environments. You’ll work closely with multidisciplinary teams to deliver reliable, scalable, and maintainable systems that meet demanding operational requirements. This role will involve contributing across the full development lifecycle, from early design and implementation through to deployment and long-term support, using modern cloud-native and DevOps practices.

Key Responsibilities

Design, develop, test, and maintain secure, high-performance backend services using modern programming languages.
Write clean, efficient, and maintainable code with a strong focus on reliability, performance, and security.
Translate solution architectures and business requirements into detailed technical designs and implementations.
Build, deploy, and manage containerised applications on Kubernetes using Helm and continuous deployment tools.
Support Agile delivery by contributing to sprint planning, backlog refinement, and user story development.
Produce clear, accurate, and high-quality technical documentation in line with agreed standards.
Participate actively in Agile ceremonies, including stand-ups, planning sessions, reviews, and retrospectives.
Monitor live systems, investigate performance or reliability issues, and implement fixes and enhancements.
Collaborate with DevOps and platform teams to improve automation, testing, and deployment pipelines.
Contribute to prototyping, experimentation, and the development of innovative technical solutions.
Ensure compliance with organisational processes, quality standards, and security requirements.
Support continuous improvement through code reviews, knowledge sharing, and adoption of new technologies.

Reasonable Adjustments:

Respect and equality are core values to us. We are proud of the diverse and inclusive community we have built, and we welcome applications from people of all backgrounds and perspectives. Our success is driven by our people, united by the spirit of partnership to deliver the best resourcing solutions for our clients.

If you need any help or adjustments during the recruitment process for any reason, please let us know when you apply or talk to the recruiters directly so we can support you.

Cloud and Infrastructure Engineer

Job title: Cloud and Infrastructure Engineer

Location: Remote, team based in Leeds

Reports to: Global Head of Infrastructure, UK

Remuneration: Up to £65k GBP (CWE) & pension, 25 days hols,

Main Purpose

Our client, a Uk based, global consulting organisation, is looking for a Cloud and Infrastructure Engineer to join their Global Cloud Services Team (GCS) with a passion for working with innovative cloud technologies, containerised hosting environments and public cloud platform engineering. The GCS team are responsible for providing first-class cloud infrastructure and engineering for our client’s managed cloud service solution to our clients and colleagues worldwide.

GCS Engineers undertake infrastructure builds to specification, are responsible for incident response to remediation for infrastructure incidents and carry out maintenance activities such as patching and technical application upgrades to infrastructure estate and cloud hosted systems.

Key Responsibilities

* To carry out infrastructure builds to specification and internal team processes and documentation

* To provide incident response as and if required – either from monitoring and alerting or as a third-line escalation from support desk - and to undertake investigative and diagnostic processes, formulate and execute action plans, and work in teams to progress incidents to resolution

* Patch and upgrade application and platform components to pre-arranged schedules to keep within vendor support and latest security versions

* To take part in the incident management call rota to respond to incidents at 3rd line/from alerts on infrastructural platforms.

* To continuously review, improve, simplify and expedite existing practises, processes, tooling, runbooks, build documents - and more - to identify opportunities for improvement and optimisation

* To take part in GCS’ proactive monitoring regime using cloud-native tooling and to respond with remedial actions to resolve issues and prevent incidents

* To stay up to date with new and emerging technologies: most notably cloud architecture and containerisation platforms personally and to leverage the experience and knowledge within the team (and wider) to make GCS an organization-leading technical resource

Required Skills/Technologies

* Containerization experience – preferably IBM RedHat OpenShift but also any experience working with Docker or Kubernetes in any form, demonstrable understanding of principles. Ability and demonstrable experience with building OpenShift clusters and/or building new environments into OpenShift clusters a distinct advantage. Experience managing OpenShift clusters, adding and removing nodes and compute resource, and troubleshooting as part of incident response also an advantage.

* Public cloud platform engineering skills - preferably AWS and/or Azure - implementation and demonstrable builds experience a distinct advantage

* Serverless architectures and experience with cloud-native toolsets designed for immutable infrastructure

* Infrastructure as code experience – preferably Ansible but also Terraform, Chef, Puppet, AWS CloudFormation, Azure Resource Manager, Google Cloud Deployment Manager etc

* Configuration management experience – preferably Ansible but also Chef, Puppet etc

* Source control and CMDB – preferably GitHub and Azure DevOps but also SVN, Jira, Confluence

* Linux environment experience – Debian-based, Red-Hat based

* Cybersecurity practises and frameworks – notably demonstrable experience with ISO27001, SOC1/2, CyberEssentials+ and/or the NCSC 14 Principles of Cyber Security

Skills and Qualifications

* Team players only in GCS – our strength is in our ethos and our culture and our quality of service

* Ability to write strong and concise technical documentation to add to and improve the GCS technical library

* Ability to troubleshoot, diagnose, replicate-simulate-quantify, triage and investigate technical issues and problems, and the ability to identify root causal factors

* Professional and committed, able to work unsupervised for task completion, disciplined, organised

* Applicants that enjoy playing a mentoring role to others or bringing others up to technical grade within the team will be developed, encouraged and rewarded

* Empathic to colleagues and service requestors and willing to be as helpful as possible in a service provision role where customers are internal as well as external

* This is not primarily a client-facing role, but applicants that are comfortable to deal directly with clients and can explain technical issues to client end users may be asked to assist in this regard from time to time

* Ability to succeed working in virtual teams with colleagues on the other side of the world

Place of Work:

This role is either:

* Home-based (homeworker);

* Based at the Leeds Office (Thorpe Park, East Leeds LS15); or

(By agreement dependent upon the personal circumstances of the applicant)

Some travel to a local office office to meet for face-to-face meetings with line management is required from time to time.

Some travel to client premises may be required if specific need is identified., although this is uncommon.

Hours of Work:

This permanent role works a standard eight-hour day, five days a week. As a global role, some work outside the regular business day is required, and the salary reflects this commitment.

We Commit to offer You:

* A commitment to your personal development, allowing you to grow with the company.

* Training and development to improve your skills, including internal team mentorship programs.

* A professional, friendly, inclusive workplace.

* Flexible working options, including working from home

Forward Deployed Engineer

Location: Remote – Must be UK based
Type: Full-Time
Salery: £60,000-£166,000 depending on experience
Industry: AI / Enterprise GenAI / Agentic Systems

Are you building production-grade GenAI systems — not just demos?

We’re partnering with a cutting-edge AI organisation delivering transformative, agentic AI solutions to enterprise clients. They are looking for a Senior GenAI Solutions Engineer to embed with customers, lead deployments, and build bespoke AI systems that drive measurable business impact.

This is a high-ownership, customer-facing role at the intersection of AI engineering, cloud deployment, and strategic solution delivery.

What You’ll Be Doing

Design and deploy enterprise-grade RAG and multi-agent systems

Build transformative agentic AI solutions using tools such as LangChain and LangGraph

Lead technical delivery from prototype to stable production release

Deploy ML systems across AWS, Azure, or GCP environments

Embed within client teams to co-develop AI solutions aligned to KPIs

Conduct technical debugging and root cause analysis

Rapidly prototype innovative AI systems in ambiguous environments

Drive adoption and demonstrate measurable business outcomes

Implement best practices across AI engineering, DevOps, and MLOps

What We’re Looking For

Proven experience building GenAI applications (RAG, multi-agent systems, fine-tuning)

Strong understanding of Model Context Protocols, A2A Protocols, Agent Developer Kits, and LLM orchestration and evaluation

Experience deploying production-grade ML/GenAI systems in cloud environments (AWS, Azure, or GCP)

Strong hands-on data science expertise (pandas, scikit-learn, PyTorch, etc.)

DevOps and infrastructure experience including Docker, Kubernetes, Terraform, CI/CD pipelines (GitHub Actions, Jenkins, CircleCI), and GitOps workflows

Full-stack engineering capability (Python, JavaScript or similar stacks)

Experience in customer-facing technical roles (5+ years)

Ability to communicate complex AI concepts to both technical and non-technical audiences

Strong risk awareness and proactive problem-solving mindset

Ideal Profile

Have scoped and delivered complex systems in fast-moving, ambiguous environments

Understand how AI model behaviour impacts product experience

Can move seamlessly between engineering teams and executive stakeholders

Take ownership of delivery and outcomes — not just code

Education

Graduate degree in Computer Science, Engineering, Statistics, Operations Research, or equivalent practical experience.

Why Join?

Work on truly transformative AI deployments

High autonomy and ownership

Fully Remote working options

Direct client impact with measurable KPI outcomes

Opportunity to shape next-generation agentic AI systems

Fast-growing, innovation-driven environment

First round interviews are being arranged, please send in your CV for a confidential discission

AI Architect – Generative AI / LLM / Cloud (Enterprise Scale)

We are hiring an experienced AI Architect to lead the design and governance of enterprise-grade AI/ML and Generative AI solutions across data, application and cloud infrastructure layers.

This is a senior-level architecture role focused on delivering production-ready LLM, RAG and agentic AI platforms within complex enterprise environments.

Key Responsibilities

* Define and own AI reference architecture across data ingestion, model orchestration, inference services and application integration

* Architect and deploy LLM solutions (GPT, BERT, Transformers) including RAG pipelines and vector databases

* Lead LLMOps / MLOps strategy including model lifecycle, CI/CD for ML, model registry and monitoring

* Design scalable cloud-native AI solutions in Azure, AWS or GCP

* Ensure governance, Responsible AI, security, compliance and non-functional requirements (NFRs)

* Engage senior stakeholders and shape AI roadmaps from discovery through delivery

Required Experience

* Proven experience as an AI Architect / Machine Learning Architect / GenAI Architect

* Hands-on expertise with LLMs, RAG, LangChain, LangGraph, prompt engineering

* Strong cloud experience: Azure OpenAI, AWS Bedrock/SageMaker or GCP Vertex AI

* Experience with Kubernetes, Docker, microservices, API integration

* Strong knowledge of Python, MLOps, LLMOps, CI/CD, model monitoring

* Experience delivering enterprise AI solutions at scale

Desirable

* Experience with vector databases (Pinecone, FAISS)

* AI governance, compliance and security architecture

* Azure AI / AWS ML / Kubernetes certifications

This role suits a technically hands-on architect who has delivered production AI platforms, not purely research or academic profiles.

Apply now to discuss full details

Platform Engineer with key skills in AWS, Kubernetes, Docker and back end services (Python, node, Go etc) is sought on a remote basis by a high growth FinTech.

Working at the forefront of B2B Saas financial security this Platform Engineer will be working with a close-knit technical team to monitor, manage and improve business critical applications and infrastructure with the aim of facilitating improvements in application deployment & scalability.

This role would suit a DevOps or Platform Engineer with a solid background in software engineering and infrastructure who can bring experience working with the latest automation and config management tooling to create a fully automated deployment environment.

In return this Platform Engineer can expect a dynamic, engaging, R&D driven culture with extensive progression opportunities and the chance to own the platform functionality of this high growth business.

This Platform Engineer should have most of the following key skills:

Strong experience in owning scalability ideally within a product lead environment
Solid back end services experience - Go, Python, node etc)
IAC experience - Terraform, Ansible, Redhat etc
AWS exposure
Strong containerization experience - Docker, Kubernetes
Basic VM management experience
Experience delivering new tooling for infrastructures
Rational database exposure - Postgres, MySQL etc
An agile, flexible personality who feels comfortable working in a fast paced environment

This Platform Engineer will receive

Base salary of up to £100,000
Long term remote working (one day a month on site max)
Generous equity stake with a clear business exit strategy
Bonus scheme
Private healthcare
Flexible working hours
Clear progression and personal development opportunities
25 days annual leave plus bank holidays
Private pension
Regular salary reviews

So if you are a Platform Engineer and like the idea of joining a market leading company that offers excellent project ownership skills within a collaborative, autonomous environment please apply now to be considered.

Platform Engineer
Remote
Linux, Java, Python, MongoDB, Postgres, automation, node, infrastructure, Kubernetes, ansible, AWS, Terraform, ansible

DevOps Engineer, Assistant Vice President

This job is with State Street, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.

Role Summary:

We are seeking a skilled DevOps Engineer to collaborate with our development and operations teams to design, implement, and manage scalable infrastructure and deployment pipelines. The ideal candidate will have experience with cloud platforms, CI/CD processes, automation, and monitoring tools, with a strong emphasis on optimizing system performance and ensuring reliable, secure, and efficient deployments. You will work closely with cross-functional teams to build robust, scalable, and secure digital asset products that meet our business and technical requirements.

Role Description:

Design and Implement CI/CD Pipelines: Develop, maintain, and optimize continuous integration and continuous deployment pipelines to streamline the development and release processes.

Infrastructure Management: Manage and maintain cloud-based infrastructure on Azure Cloud Platform, including provisioning, configuration, and automation of resources.

Automation and Scripting: Create and manage automation scripts for deployment, configuration, and monitoring using tools such as Ansible, Terraform, or Puppet.

Monitoring and Troubleshooting: Implement monitoring solutions to ensure system health and performance. Troubleshoot and resolve issues related to applications, systems, and infrastructure.

Collaboration: Work closely with development, QA, and operations teams to understand requirements, deliver solutions, and address technical challenges.

Security and Compliance: Ensure best practices for security, compliance, and data protection across all systems and processes.

Documentation: Create and maintain clear documentation for infrastructure, deployment processes, and best practices.

Performance Optimization: Analyze system performance and implement improvements to enhance the efficiency and scalability of applications and infrastructure.

Core/Must have skills :

Proven track record of managing and deploying applications and infrastructure.

Cloud Platforms: Proficiency in Azure cloud services.

CI/CD Tools: Experience with Harness, GitLab CI, GitHub Actions, or similar tools.

Configuration Management: Knowledge of tools such as Ansible, Chef, Puppet, or SaltStack.

Infrastructure as Code: Experience with Terraform, CloudFormation, or similar tools.

Scripting Languages: Proficiency in scripting languages like Bash, Python, or Ruby.

Containers and Orchestration: Familiarity with Docker, Kubernetes, or OpenShift.

Version Control Systems: Experience with Git or other version control systems.

Monitoring and Logging: Knowledge of monitoring tools like Prometheus, Grafana, ELK Stack, or similar.

Ability to work independently and manage multiple tasks or projects.

Good to have skills

Certifications: Cloud certifications (AWS Certified DevOps Engineer, Azure DevOps Solutions Expert, etc.) are a plus.

Additional Experience: Experience with microservices architecture and API management.

About State Street Across the globe, institutional investors rely on us to help them manage risk, respond to challenges, and drive performance and profitability. We keep our clients at the heart of everything we do, and smart, engaged employees are essential to our continued success.

We are committed to fostering an environment where every employee feels valued and empowered to reach their full potential. As an essential partner in our shared success, you’ll benefit from inclusive development opportunities, flexible work-life support, paid volunteer days, and vibrant employee networks that keep you connected to what matters most. Join us in shaping the future.

As an Equal Opportunity Employer, we consider all qualified applicants for all positions without regard to race, creed, color, religion, national origin, ancestry, ethnicity, age, disability, genetic information, sex, sexual orientation, gender identity or expression, citizenship, marital status, domestic partnership or civil union status, familial status, military and veteran status, and other characteristics protected by applicable law.

Discover more information on jobs at StateStreet.com/careers

Read our CEO Statement

]]>

QA Automation Engineer, Assistant Vice President

This job is with State Street, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.

Role Summary :

We are seeking an experienced QA Automation Engineer to join our Digital Asset development team. This is a hands‑on, quality engineering role responsible for designing, building, and maintaining automated test frameworks and test suites for mission‑critical digital asset platforms within a regulated financial services environment. You will work closely with cross-functional teams to build robust, scalable, and secure digital asset products that meet our business and technical requirements.

Role Description :

Design, develop, and maintain automated test frameworks for digital asset platforms, with a strong focus on API and backend testing

Create and execute automated test suites covering digital asset workflows, including custody events, token lifecycle actions, transfers, settlements, and reporting

Validate data integrity and reconciliation across digital and traditional systems, ensuring consistency between on‑chain/off‑chain records and downstream financial systems

Collaborate closely with developers and product teams, contributing to testability, acceptance criteria, and automation‑ready designs

Build and maintain integration and regression test coverage for microservices‑based platforms deployed in containerized environments

Support non‑functional testing relevant to financial services, including performance, resilience, and negative/error scenarios

Participate in defect triage and root‑cause analysis, working with engineering teams to identify issues and prevent recurrence

Ensure testing approaches align with enterprise SDLC, risk, and compliance expectations, including evidence capture for audits and control reviews

Core/Must have skills :

Proven experience as a QA Automation Engineer within financial services or other regulated environments

Strong hands‑on experience with test automation for APIs and backend services (e.g. REST APIs, service‑to‑service integrations)

Solid understanding of QA practices across the full SDLC, including Agile delivery models and automation‑first testing strategies

Experience validating data accuracy, integrity, and consistency, particularly in systems handling financial or transactional data

Strong analytical skills, attention to detail, and the ability to collaborate effectively with developers and product stakeholders

Good to have skills :

Working knowledge of digital asset concepts, such as custody, tokenization, wallets, or blockchain‑based transaction flows

Experience testing digital asset or blockchain‑adjacent platforms, including on‑chain/off‑chain integration points

Exposure to event‑driven or streaming architectures used for digital asset data distribution and reporting

Familiarity with containerized or microservices‑based platforms, and testing services deployed in Kubernetes environments

Experience with performance and resilience testing for APIs supporting high‑value or time‑sensitive transactions

Understanding of risk, controls, and audit requirements in financial services technology.

Discover more information on jobs at StateStreet.com/careers

Read our CEO Statement

]]>

About Scrumconnect Consulting: Scrumconnect Consulting is a multi-award-winning digital consultancy, recognised for delivering impactful technology solutions across UK government departments. Our work has positively influenced the lives of over 40 million UK citizens. With a strong commitment to user-centred design and agile delivery, and more to deliver innovative digital services that matter.

Role Description
As a Solutions Architect, you will design robust, secure and scalable solutions within Google Cloud Platform ecosystem. You will translate business and technical requirements into end-to-end architecture designs covering applications, APIs, data platforms and integrations. You will ensure alignment with GDS standards, Technology Code of Practice and public sector governance frameworks. You will support multi-cloud portability considerations where required and ensure reusable architecture patterns that future-proof digital estate. You will provide architectural leadership, assurance and mentoring to strengthen in-house capability.

Preferred Tech Stack Expertise Google Cloud Platform services, API management platforms such as Apigee, Python and web application frameworks, PostgreSQL, Terraform, Kubernetes or Cloud Run, secure architecture design principles

Responsibilities

Design end-to-end cloud-native solutions aligned to functional and non-functional requirements
Ensure security, scalability and resilience across application and data architectures
Define reusable architecture patterns and promote future-proof design principles
Provide assurance through technical design reviews and governance boards
Support API life cycle design and integration within GCP environments
Collaborate with delivery teams to ensure architectural alignment throughout implementation
Lead mentoring and capability development initiatives

Diversity & Inclusion
At Scrumconnect Consulting, we believe that diversity drives innovation. We are committed to creating an inclusive environment where every individual is respected, valued, and supported. We welcome applications from candidates of all backgrounds and experiences, and we actively encourage applications from women, people with disabilities, under-represented communities, and those seeking flexible working arrangements.

Senior Software Engineer

Reference: BH-377p Working Hours: Full-time Job Type: Permanent Salary: Competitive Location: Remote or Hybrid - London office About The Client: Our client is a rapidly growing Infrastructure-as-a-Service (IaaS) provider driving digital transformation. Key Responsibilities: Architecture & System Design Design and evolve scalable backend services and product components. Make sound architectural decisions across APIs, services, and data layers. Lead delivery from design through production operation. Full-Stack Development Build backend systems primarily in Python. Develop secure, performant APIs supporting AI workflows. Contribute to modern web applications where required (e.g., Next.js). Reliability & Performance Improve monitoring, observability, and system resilience. Optimise performance and support production stability. Engineering Leadership Maintain high coding standards and test coverage. Contribute to code reviews and documentation. Mentor engineers and support technical growth. Collaboration Work closely with Product and Design to deliver scalable solutions. Communicate technical trade-offs and manage cross-team dependencies. Essential skills and requirements: Strong experience building production backend systems and APIs in Python (Flask or similar). Proven ownership of asynchronous or compute-intensive workflows. Experience delivering full-stack features (e.g., Next.js). Practical understanding of AI lifecycle workflows (training, evaluation, deployment, inference). Solid system design knowledge including API design, SQL/NoSQL data systems, and security. Experience managing systems in production (monitoring, debugging, incident response). Familiarity with Git, CI/CD, Docker, and Kubernetes. Desirable Skills: Exposure to LLMs or generative AI platforms. Experience with model lifecycle management or AI observability. Understanding of GPU-based or distributed systems. Experience building developer platforms or workflow orchestration tools. Whats on Offer: Competitive salary + bonus Flexible remote or hybrid working Wellbeing benefits Clear progression in a high-growth environment Strong ownership and collaborative culture TPBN1\_UKTJ

Software Engineer

Spectrum IT Recruitment

A growing software business is hiring a Software Engineer to support continued growth and rising demand for its platform. Software Engineer Remote | Up to £60,000 | Java or Kotlin | Spring Boot This is a back-end focused role, ideal for an engineer who enjoys building reliable, scalable applications and wants to work on software used by well-known customers. You will be involved in the design, development and delivery of new features, as well as improving core products in a fast-moving environment. The role is fully remote, with the option to attend the Newbury office around once a month if desired. What you will be doing Developing back-end applications using Java or Kotlin Building software with Spring Boot Designing and delivering new features Enhancing existing products and platform capability Working across the full software development lifecycle Supporting QA and UAT feedback through to release Collaborating with project, account and delivery teams to understand requirements and turn them into practical solutions Writing clean, maintainable, high-quality code What they are looking for Strong commercial experience in software engineering Good hands-on experience with Java or Kotlin Strong experience with Spring Boot A back-end development background Experience building scalable systems in a modern development environment Strong problem-solving skills and attention to detail Ability to work independently in a remote setup Nice to have Microservices experience Docker or Kubernetes RabbitMQ or other messaging tools MongoDB or other NoSQL database experience Experience in telecoms, billing or transaction-led systems Package Up to £60,000 Remote working Optional monthly office time in Newbury 24 days holiday plus birthday off Private medical Life assurance Critical illness cover Employee assistance programme Contributory pension Apply now or contact Chris Lynes at Spectrum IT Recruitment Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy

Site Reliability Engineer / SRE / Systems Engineer

A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience.

If you’ve also worked in the following roles, we’d also like to hear from you: DevOps Engineer, Operations Engineer, Cloud Engineer, Platform Engineer, Systems Engineer, Infrastructure Engineer, Production Engineer

SALARY: up to £70,000 per annum (depending on experience) + Benefits

LOCATION: Remote and Hybrid Working Options Available. You can either work remotely of if you prefer Hybrid working from home and the office in Altrincham, Greater Manchester, North West England

JOB TYPE: Full-Time, Permanent

JOB OVERVIEW

We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services.

As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments.

This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.

APPLY TODAY

Ready to make your next career move? Apply Now for our Recruitment Team to review.

DUTIES

Your duties as the Site Reliability Engineer / Systems Engineer include:

Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover
System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services
Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues
Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience
Automation and Resilience: Supporting automation, incident response and continuous improvement practices
New Service Support: Ensuring new products and features are operable, reliable and scalable from day one
Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues
Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports
Incident Prioritisation: Balancing customer impact with long-term system health and stability
Security and Compliance: Supporting compliance with security, availability and regulatory frameworks

CANDIDATE REQUIREMENTS

ESSENTIAL

Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role
Experience supporting production services at scale within a DevOps or SRE environment
Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6
Experience with observability tools such as Prometheus, Grafana, ELK or Splunk
Hands-on experience with containerisation and orchestration using Docker and Kubernetes
Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices
Strong Linux administration skills with scripting capability in Bash, Python or similar
Familiarity with CI/CD pipelines and source control tools such as GitHub Actions
Understanding of security frameworks and operational resilience best practices

DESIRABLE

Experience within ISP, MSP or telecommunications environments
Familiarity with enterprise IT architectures including OSS and BSS systems
Knowledge of information security frameworks such as ISO27001, NIST or GDPR
Experience with infrastructure automation tools such as Terraform or Ansible

BENEFITS

Smart casual dress code
Free access to gym facilities
Access to a financial wellbeing platform (on successful completion of probationary period)
Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period)
Access to cycle to work, childcare, and electric vehicle schemes after six months
Brand new office with excellent transport links
Supportive team culture, growth and career progression

HOW TO APPLY

To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CV’s of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose.

JOB REF: AWDO-P14376

Full-Time, Permanent Jobs, Careers and Vacancies. Find a new job and work in Altrincham, Greater Manchester, North West England. Multi-Job Board Advertising and CV Sourcing Recruitment Services provided by AWD online.

AWD online specialise in sourcing candidates and advertising vacancies on multiple job boards for companies on a non-commission basis. AWD online operates as an employment agency.

awd online

Lead Devops Engineer - SC Cleared

Role: Lead DevOps Engineer Rate: £675 - £715 per day (Inside IR35 - Umbrella) Location: Fully Remote (UK-Based) Duration: Initial 6 months (extensions highly likely) Clearance: Active SC Clearance (Required) The Opportunity Join a high-profile Public Sector transformation programme. We are seeking a senior-level DevOps Engineer to serve as technical leads in the design and evolution of secure, cloud-native platforms. This is a pure IaC (Infrastructure as Code) environment focused on enterprise-scale resilience and security. Core Technical Stack \* AWS Master: Deep expertise across EC2, Lambda, S3, IAM, and VPC. Terraform Expertise: Proven track record in designing and refactoring complex IaC environments (Git/GitLab integration). CI/CD Orchestration: Advanced experience with GitLab CI and Jenkins to automate end-to-end delivery pipelines. Containerization: Practical experience with Docker and Kubernetes (EKS) for scaling microservices. Scripting: High proficiency in Python, Bash, and Ansible for automation and 'code review' leadership. Responsibilities Lead Technical Design: Shape the architectural direction of the AWS environment. Refactoring & Quality: Lead code reviews and promote refactoring techniques to enhance a multi-site enterprise codebase. Security & Compliance: Implement robust security measures aligned to GDS and Home Office security standards. Stakeholder Liaison: Act as the technical bridge between IT colleagues and business clients to track prioritisation and delivery. Apply Now for immediate consideration TPBN1\_UKTJ

Machine Learning Engineer

Core Duties Design and develop machine learning models for traditional ML use cases (forecasting, classification, anomaly detection) and GenAI/LLM applications Lead experimentation cycles: define hypotheses, design experiments, evaluate results, and iterate rapidly while adhering to governance requirements Transition validated experiments into production-ready solutions, working closely with other engineers on deployment and monitoring Build and optimise ML pipelines using AWS services and experiment tracking tools Develop and integrate LLM-powered solutions for tracing, evaluation, and production monitoring Implement robust experiment tracking, model versioning, and reproducibility practices with full audit trails Design feature engineering approaches and contribute to feature store development Support production models through monitoring, performance analysis, and continuous improvement Apply responsible AI practices, including model explainability and fairness assessment Present experiment findings and production outcomes to stakeholders, articulating operational and strategic value Mentor junior colleagues and share learnings across the team About You You will have experience in many of the following: Hands-on experience developing and deploying ML models in Python using frameworks such as scikit-learn, XGBoost, PyTorch, or TensorFlow Strong experience with AWS ML services (SageMaker, Lambda, S3) in production environments Strong experiment design skills: hypothesis formulation, A/B testing methodology, and statistical evaluation Proven track record transitioning models from experimentation to production with appropriate governance and quality controls Experience with experiment tracking and MLOps tooling (MLflow, Weights & Biases, Data Version Control) Experience developing LLM/GenAI applications, including prompt engineering and RAG architectures It Would Be Great If You Also Had Experience In Some Of These, But If Not Well Help You With Them Experience with advanced LLM techniques: agents, tool use, and agentic workflows Experience with vector databases (Pinecone, Weaviate, pgvector) for RAG applications Experience with feature stores (Feast, AWS Feature Store) Experience with containerisation (Docker) and orchestration (Kubernetes, ECS) Familiarity with Infrastructure as Code (Terraform, CloudFormation) Experience with data processing frameworks (Spark, Dask) for large-scale workloads Understanding of data governance and compliance frameworks TPBN1\_UKTJ

Senior Software Engineer (PHP)

Permanent full time

Build software used by thousands. Influence technical direction. Mentor others.

IRIS is one of the UK’s largest privately held software companies and a major European provider of payroll and HR solutions. We’re now looking for a Senior Software Engineer to join our Dataplan team.

This is a hands on senior role combining deep technical contribution with delivery ownership and mentoring responsibility.

You’ll help shape and scale a cloud-based payroll and HR platform serving thousands of customers.

The Role

You’ll spend the majority of your time writing high quality code while also:

* Driving end-to-end feature delivery

* Partnering with Product on roadmap and prioritisation

* Mentoring and supporting junior engineers

* Leading best practice adoption (testing, CI/CD, observability)

* Contributing to architectural decisions

* Managing technical debt and platform health

* Supporting incident resolution and continuous improvement

Our Tech Stack

Core:

* PHP (Laravel)

* ReactJS

* JavaScript

* Relational databases

* Kubernetes

* Docker

* Linux

Desirable:

* React Native

* MariaDB

* Jenkins

* GitLab

About You

* 5+ years’ experience in software engineering

* Strong experience building and shipping scalable web applications

* Comfortable in Agile environments with sprint-based delivery

* Strong knowledge of CI/CD and DevOps principles

* Experience supporting production systems

* Passion for mentoring and raising engineering standards

Bonus points for experience in regulated sectors (finance, health, govtech) or working with AI tools to improve engineering productivity.

What You’ll Get

* Opportunity to influence technical direction

* Exposure to enterprise scale systems

* Clear progression opportunities

If you’re a senior engineer who enjoys owning delivery, shaping solutions, and developing others — we’d love to hear from you.

Please note:

We occasionally close vacancies early in the event that we receive a high volume of applications. Therefore we recommend you apply as soon as possible

Senior Platform Engineer

Spectrum IT Recruitment

We are looking for a Senior Platform Engineer to join a growing platform team supporting a large-scale SaaS platform. This role focuses on improving reliability, scalability, and performance while helping drive a major cloud-native transition to Azure and Kubernetes.

You will work closely with engineering teams to modernise infrastructure, automate operations, and ensure highly observable and resilient systems in a production environment handling sensitive data.

Key Responsibilities

Design and deliver cloud migration and containerisation initiatives
Operate and improve production monitoring, alerting, and system reliability
Automate infrastructure provisioning using Pulumi and Ansible
Participate in incident response and on-call rotations
Lead blameless post-mortems and continuous improvement efforts
Collaborate with product engineering teams on observability and performance

Key Skills

Strong experience with Azure cloud architecture
Hands-on Kubernetes experience
Pulumi experience
Experience troubleshooting distributed systems
Monitoring and observability tools such as Prometheus, Grafana, or Graylog
Linux systems administration
Scripting with PowerShell, Bash, or Python
Experience with SQL Server, PostgreSQL, or Redis
Strong documentation and communication skills

Desirable

Software development background (C#, Go, TypeScript, Python)
Experience with Ansible
Go development experience
Experience in regulated environments
Familiarity with Windows / IIS environments

Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

Key Responsibilities

Participate in the growth, industrialization, and DevOps of the Airops Fast Data platform.* Develop event-driven data interfaces between Airops products, the data platform, and third-party applications or middleware, within an MS Azure environment.* Work within a DevOps team and participate in the maintenance of the Azure platform.* Utilize tools like Jenkins, ArgoCD, and Helm charts for continuous integration and continuous deployment (CI/CD) processes.* Work with Kubernetes and OpenShift for container orchestration and management.* Develop and maintain event streaming solutions using Kafka and Azure Event Hub.

Ideal Candidate Profile

Java Development: Previous experience as a Java developer, with expertise in Java Quarkus or Java Spring Boot (Java version 21).* Cloud Native: Experience developing in a cloud environment (MS Azure or AWS).* Event Streaming: Experience with event streaming infrastructure (e.g., Kafka, Azure Event Hub).* DevOps: Experience with DevOps infrastructure and components, including Jenkins, ArgoCD, and Helm charts.* Containerization: Proficiency in containerization and orchestration tools such as Kubernetes and OpenShift.* Software Maintenance: Experience in software maintenance, including incident resolution, root cause analysis, and post-mortem analysis.* Agile Methodologies: Experience working in Agile environments (Scrum, SAFe).

We are seeking an experienced AI Architect to join a global consulting team. This role is central to shaping enterprise-scale AI transformations, combining deep technical expertise with strategic client engagement.

As a Gen AI Architect, you will:

Lead the design and delivery of AI and cloud-native architectures, including Generative AI, NLP, and LLM solutions.
Retrieval-Augmented Generation (RAG) and CAG (Cache Augmented) Architecture: Defining architectural patterns for end-to-end pipelines
Act as a trusted advisor to senior stakeholders, guiding AI roadmaps and strategy.
Translate complex business needs into scalable AI-driven solutions across public cloud, edge, and hybrid environments.
Drive thought leadership through client workshops, industry forums, and technical advisory.
Ensure AI solutions meet governance, ethics, and responsible AI standards.
Collaborate with internal teams and global partners to deliver world-class AI platforms.

Key Skills & Experience:

10+ years of technical leadership (with a strong background in Software Engineering / Enterprise scale architecture)
Knowledge of GenAI operations (ideally, experience governing AI models in production environments)
Expertise across cloud platforms (AWS, Azure, GCP), Kubernetes, and containerised systems.
Strong technical skills in Python, Java/Go, TensorFlow, PyTorch, and data engineering.
Proven ability to engage directly with CxO-level stakeholders.
Experience in MLOps, AI governance, and large-scale deployment.
Recognised professional certifications in AI or cloud technologies.

If you have these skills and would like to find out more, please apply now

Platform Operations Engineer

Job Title: Platform Operations Engineer
Department: AI Lab - Platform Operations Team
Location: Remote
Employment Type: Full-time, Permanent

Overview
We are actively looking to secure multiple Platform Operations Engineers to join Experis, part of the ManpowerGroup - a global organisation with over $20?billion in annual revenue and more than 1,000 consultants on assignment across 20 clients worldwide.
Experis UK is in an exciting growth phase, with ambitious expansion plans and deep partnerships across multiple industries. Our model is personal and career-focused: we invest in our consultants through continuous training, technology exposure, and collaborative development.

About the Role
IBM’s AI Lab is building next-generation AI platforms and services. To support this mission, we’re growing our Platform Operations Team, responsible for the cloud infrastructure that powers our AI services. As a Platform Operations Engineer, you’ll work across AWS, Kubernetes, and internal automation tools to ensure the platform runs smoothly, securely, and efficiently.
This role suits someone who enjoys working at the intersection of software development and operations - writing code, automating infrastructure, and supporting high-performance machine learning environments.

Key Responsibilities

Deploy, manage, and monitor applications on AWS EKS (Kubernetes)
Build and maintain Helm charts, manifests, and ArgoCD configurations
Contribute Python code for internal tooling, automation, and services
Manage CI/CD pipelines (e.g. Concourse, GitLab CI)
Troubleshoot issues in networking, permissions, and application performance
Work with development teams to streamline deployment and scaling of AI systems
Maintain secure cloud environments through thoughtful IAM and Terraform configurations

Essential Skills and Experience

Strong hands-on experience with Kubernetes (deployment, debugging, Helm)
Intermediate to advanced Python development skills
Familiarity with CI/CD pipelines, especially writing and debugging them
Solid understanding of AWS services (EKS, IAM, S3)
Confident with Linux-based environments and containerization (Docker)

Ideal (Bonus) Skills

Experience with Helm, ArgoCD, and GitOps workflows
Practical knowledge of Terraform for infrastructure-as-code
Understanding of Kubernetes networking, ingress management, and certificate handling
Exposure to OAuth/OpenID, certificates, and authentication proxies

Splunk and OpenShift Observability Engineer

CBSbutler Holdings Limited trading as CBSbutler

We’re looking for a Splunk & OpenShift Observability Engineer to design, deploy, and optimise enterprise-grade monitoring across hybrid Kubernetes and OpenShift environments.

This is a high-impact role where you’ll shape observability strategy, enhance service intelligence, and ensure platform reliability at scale - balancing performance, cost efficiency, and security governance.

You’ll work at the intersection of platform engineering, observability, and service intelligence, helping to transform raw telemetry into actionable insight. This is an opportunity to influence reliability strategy, improve operational maturity, and deliver measurable value across a modern cloud-native estate.

What You’ll Be Doing

Design, deploy, and operate Splunk Enterprise and ITSI across hybrid Kubernetes/OpenShift platforms
Onboard and normalise data at scale (HEC, Universal Forwarder, Deployment Server), aligning to CIM standards
Build and optimise ITSI service models: service trees, KPIs, adaptive thresholds, NEAP policies, glass tables, deep dives, and health scoring
Deliver OpenShift-focused executive and operational dashboards, including:
Cluster/API/etcd health
Node readiness and resource pressure
Pod restart trends and noisy-neighbour detection
Network and storage error visibility
Capacity, quota, and burst analysis
Optimise search and platform performance (workload rules, DMA, summary indexing, scheduling hygiene, concurrency tuning)
Implement intelligent alerting and automated routing into ITSM and ChatOps platforms, including enrichment, suppression windows, and maintenance scheduling
Govern data ingestion and security controls (RBAC, retention, PII handling, TLS, token governance, index and role mapping)
Integrate telemetry pipelines including OpenTelemetry, Prometheus, Fluentd/Fluent Bit/Vector, Kafka, CMDB and AIOps/ML solutions
Drive SLO/KPI alignment, golden signal monitoring, rollout/rollback health validation, and executive reporting

What You’ll Bring

Deep expertise in Splunk Enterprise (SPL mastery, CIM alignment, saved searches, macros, KV stores, index/retention/RBAC design, performance tuning)
Strong experience with Splunk ITSI (service trees, KPIs, adaptive/time-based thresholds, NEAP tuning, Service Analyzer configuration)
Proven OpenShift/Kubernetes observability experience across control-plane metrics, events, logs, workload correlation, and capacity management
Hands-on experience with telemetry pipelines (OpenTelemetry/OTLP, Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka with TLS, HEC/UF/DS onboarding)
Strong understanding of reliability engineering principles (golden signals, SLO design, namespace/application KPI mapping)
Experience optimising performance and licensing costs using workload rules, DMA, and summary indexing
Solid security and compliance knowledge (TLS/mTLS, certificate/token hygiene, PII controls, auditability, role/index mapping)
Automation and integration expertise across ITSM, ChatOps, webhooks, CMDB enrichment, and AIOps tooling

Backend Software Engineer

Key Responsibilities

Design, develop, test, and maintain secure, high-performance backend services using modern programming languages.
Write clean, efficient, and maintainable code with a strong focus on reliability, performance, and security.
Translate solution architectures and business requirements into detailed technical designs and implementations.
Build, deploy, and manage containerised applications on Kubernetes using Helm and continuous deployment tools.
Support Agile delivery by contributing to sprint planning, backlog refinement, and user story development.
Produce clear, accurate, and high-quality technical documentation in line with agreed standards.
Participate actively in Agile ceremonies, including stand-ups, planning sessions, reviews, and retrospectives.
Monitor live systems, investigate performance or reliability issues, and implement fixes and enhancements.
Collaborate with DevOps and platform teams to improve automation, testing, and deployment pipelines.
Contribute to prototyping, experimentation, and the development of innovative technical solutions.
Ensure compliance with organisational processes, quality standards, and security requirements.
Support continuous improvement through code reviews, knowledge sharing, and adoption of new technologies.

Reasonable Adjustments:

If you need any help or adjustments during the recruitment process for any reason, please let us know when you apply or talk to the recruiters directly so we can support you.

Role Summary

We are seeking a highly skilled MLOps Engineer to focus on the deployment, monitoring, and maintenance of machine learning models in production environments. This role is platform-focused and does not involve model development or end-user support. The successful candidate will ensure reliability, scalability, and performance of ML platforms while managing API endpoints and deployment workflows.

Key Responsibilities

Platform Operations & Monitoring

Monitor ML model endpoints and platform health using tools such as Grafana and Domino Data Lab
Respond to incidents and alerts; perform code fixes and manage changes via ServiceNow
Liaise with Domino Data Lab support to resolve platform-related issues

Model Deployment

Deploy and maintain ML models in production environments
Ensure models integrate seamlessly into automated pipelines
Maintain reliability, version control, and governance standards

Pipeline Maintenance

Collaborate with Data Scientists and Engineers for smooth production handoff
Maintain and optimize ML pipelines for stability and scalability
Improve performance, resource usage, and automation

Automation & Tooling

Implement automation for deployment and monitoring
Contribute to continuous platform improvements

Required Skills & Experience

Strong Python programming experience
Proven experience deploying and monitoring ML models in production
Understanding of model evaluation metrics, data drift, overfitting, and feature importance
Experience with AWS services (S3, Redshift, etc.)
Hands-on experience with Grafana for monitoring
Familiarity with Domino Data Lab (desirable)
Strong knowledge of CI/CD, version control, Docker, Kubernetes
Excellent troubleshooting and incident management skills
Strong stakeholder communication skills

Page 2 of 4

Frequently asked questions

Haystack features a wide range of remote Kubernetes jobs including roles for developers, DevOps engineers, site reliability engineers (SREs), cloud architects, and system administrators across various industries.

While certifications like CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer) can boost your profile, many employers also value hands-on experience and relevant skills over formal certification.

You can use Haystack’s search filters to narrow down jobs by experience level, such as entry-level, mid-level, or senior positions, ensuring you find Kubernetes roles that fit your background and career goals.

Haystack features a variety of remote Kubernetes job types, including full-time, part-time, contract, freelance, and temporary positions, giving you flexibility to choose what suits your preferences.

Haystack vets all job postings through a comprehensive screening process to confirm the employer's authenticity and the accuracy of job details, ensuring you find only trustworthy remote Kubernetes opportunities.