Make yourself visible and let companies apply to you.
Roles

Prometheus Jobs

Overview

Looking for top Prometheus jobs? Explore the latest Prometheus monitoring and alerting roles on Haystack, the leading IT job board. Whether you're a developer, DevOps engineer, or site reliability specialist, find your perfect Prometheus job today and advance your career in cloud-native infrastructure and observability. Start your search now!
Filters applied
Prometheus
Search
Salary
Location
Remote preference
Role type
Seniority
Tech stack
Sectors
Contract type
Company size
Visa sponsorship
Microsoft Azure DevOps Engineer - Manchester
Capgemini
Manchester
Hybrid
Mid
Private salary
RECENTLY POSTED
microsoft-azure
windows
ruby
prometheus
javascript
terraform
+11
Microsoft Azure DevOps Engineer - ManchesterReference Code: -en_GBContract Type: PermanentProfessional Communities: Software Engineering
About the job you’re considering
Our Microsoft Azure DevOps/Platform Engineers work on some of the largest Microsoft projects on the market, delivering enterprise scale projects in a wide range of industries. They lead project implementations, ensuring solutions are secure, scalable, and fit for purpose, all within the Microsoft technology stack. As subject matter experts and trusted advisors in everything DevSecOps, they build out a continuous integration paired with solid release strategies that adhere to recommended security best practices. Additionally, they can upskill our customer’s team members and provide strategic advice to management on best practices. Their passion for DevOps is anchored in Microsoft technologies, while also being knowledgeable about other open-source options, enable them to give well rounded and highly informed advice.
Hybrid working:  The places that you work from day to day will vary according to your role, your needs, and those of the business; it will be a blend of Company offices, client sites, and your home; noting that you will be unable to work at home 100% of the time.
If you are successfully offered this position, you will go through a series of pre-employment checks, including:
identity, nationality (single or dual) or immigration status, employment history going back 3 continuous years, and unspent criminal record check (known as Disclosure and Barring Service)
Your role
Lead the design, implementation, and management of Azure DevOps pipelines and infrastructure.
Develop and maintain CI/CD pipelines to automate the build, test, and deployment processes.
Collaborate with development, QA, and operations teams to ensure seamless integration and delivery of software solutions.
Implement and manage infrastructure as code (IaC) using tools such as ARM templates, Terraform, or Bicep.
Monitor and optimize the performance, scalability, and reliability of our cloud infrastructure.
Ensure security best practices are followed in all DevOps processes and tools.
Provide technical leadership and mentorship to junior DevOps engineers.
Stay up to date with the latest industry trends and best practices in DevOps, AI and cloud technologies.
You can bring your whole self to work. At Capgemini building an inclusive future is part of everyday life and will be part of your working reality. We have built a representative and welcoming environment, for everyone.
Your skills and experience
Proven experience with Azure DevOps, including Boards, Repos, Pipelines, and Artifacts.
Strong scripting capabilities in PowerShell, Bash, or Python.
Practical experience designing and implementing CI/CD workflows using GitHub Actions and GitLab Pipelines, including automation, testing, and deployment best practices.
Experience with modern programming languages such as Java, C#, JavaScript, Python, Go, or Ruby, alongside scripting languages.
Hands-on experience with Infrastructure as Code (IaC) and automation tools like Bicep and Terraform, enabling the creation and maintenance of complex cloud environments.
Solid understanding of cloud service models including PaaS, Serverless, and IaaS (e.g., VMs, storage), with a focus on secure configuration and DevSecOps principles.
Familiarity with AI-enhanced DevOps practices, including how AI can improve CI/CD, observability, testing, and infrastructure automation.
Ability to ensure AI tools comply with enterprise-grade security, privacy, and governance standards.
Experience with source control systems such as GitHub and Azure DevOps
Working knowledge of Azure services including Application Insights, Azure DevTest Labs, API Management, Web and Mobile Apps, and Windows VMs.
Practical experience with containerisation technologies like Docker and Kubernetes, and cloud-native architecture patterns.
Familiarity with monitoring and observability tools such as Azure Monitor, App Insights, Prometheus, and Grafana.
Demonstrated ability to implement and manage GitOps workflows, using Git as the single source of truth for infrastructure and application configurations, and enabling continuous delivery through automated CI/CD pipelines.
Ability to identify and integrate AI-driven tools into existing DevOps workflows to enhance efficiency and resilience.
Awareness of security and compliance frameworks such as ISO 27001, SOC 2, and related standards.
Azure certifications such as AZ-400: Designing and Implementing Microsoft DevOps Solutions)
AI-related certifications (e.g., AI-102 Azure AI Engineer Associate) are advantageous.
Knowledge of security best practices in cloud environments such as Certified Cloud Security Professional (CCSP) or Azure Security Engineer Associate (AZ-500)
GitOps Certified Associate (CGOA)
Your security clearance
To be successfully appointed to this role, it is a requirement to obtain Security Check (SC) clearance. 
To obtain SC clearance, the successful applicant must have resided continuously within the United Kingdom for the last 5 years, along with other criteria and requirements.
Throughout the recruitment process, you will be asked questions about your security clearance eligibility such as, but not limited to, country of residence and nationality. Some posts are restricted to sole UK Nationals for security reasons; therefore, you may be asked about your citizenship in the application process.
What does ‘Get the Future You Want’ mean for you?
You will be encouraged to have a positive work-life balance. Our hybrid-first way of working means we embed hybrid working in all that we do and make flexible working arrangements the day-to-day reality for our people. All UK employees are eligible to request flexible working arrangements.
You will be joining one of the World’s Most Ethical Companies®, as recognised by Ethisphere® for 12 consecutive years. We live our values by making ethical business choices every day. Working ethically is at the centre of our culture at Capgemini, meaning you will be helping to create a future we can all be proud of.
Capgemini. Get The Future You Want.
Why you should consider Capgemini
Growing clients’ businesses while building a more sustainable, more inclusive future is a tough ask. When you join Capgemini, you’ll join a thriving company and become part of a collective of free-thinkers, entrepreneurs and industry experts. We find new ways technology can help us reimagine what’s possible. It’s why, together, we seek opportunities that will transform the world’s leading businesses, and it’s how you’ll gain the experiences and connections you need to shape your future. By learning from each other every day, sharing knowledge, and always pushing yourself to do better, you’ll build the skills you want. You’ll use your skills to help our clients leverage technology to innovate and grow their business. So, it might not always be easy, but making the world a better place rarely is.
About Capgemini
Capgemini is a global business and technology transformation partner, helping organisations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. 
It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fuelled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2024 global revenues of €22.1 billion. 
Get The Future You Want |
DevOps Engineer GCP (Google Cloud) - London - London
Capgemini
London
Hybrid
Mid
Private salary
RECENTLY POSTED
prometheus
terraform
ansible
grafana
kubernetes
python
+4
DevOps Engineer GCP (Google Cloud) - London - LondonReference Code: -en_GBContract Type: PermanentProfessional Communities: Software Engineering
Job Title: DevOps Engineer GCP (Google Cloud)
Hybrid:  2 days a week working from the office
Location: London, UK
Get the future you want!
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.
Your Role
Capgemini Financial Services is seeking a talented and experienced DevOps Engineer with strong expertise in Google Cloud Platform GCP in London, UK to join our engineering team supporting strategic initiatives including those in the financial services domain Experience with blockchain technologies is a plus but not mandatory
Key Responsibilities :
Design implement and manage scalable and secure cloud infrastructure on GCP
Build and maintain CICD pipelines using tools like Jenkins GitLab CI or Cloud Build
Automate infrastructure provisioning using Terraform Ansible or similar IaC tools
Manage containerized applications using Kubernetes GKE and Docker
Monitor system performance and ensure high availability using tools like Prometheus Grafana or Stackdrive
Collaborate with development operations and security teams to ensure smooth delivery pipelines
Participate in incident response root cause analysis and continuous improvement initiatives
Document infrastructure and deployment processes
Your Profile
3 years of experience in DevOps or Cloud Engineering roles
Strong handson experience with Google Cloud Platform GCP
Proficiency in Terraform Ansible or similar IaC tool
Experience with Kubernetes Docker and container orchestration
Familiarity with CICD tools like Jenkins GitLab CI or Cloud Build
Solid understanding of cloud networking security best practices and DevSecOps principles
Scripting skills in Python Bash or similar languages
Strong problemsolving and communication skills
Nice to Have
Exposure to blockchain platforms eg Hyperledger Besu Ethereum
Experience with multicloud or hybrid cloud strategies
Familiarity with DORA metrics AIOps or MLOps practices
Certifications such as Google Professional Cloud DevOps Engineer CKA or DevOps Leader"
About Capgemini
Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of €22.5 billion.
Get the future you want |
Platform Engineer - Bristol
Capgemini
Bristol
Hybrid
Mid
Private salary
RECENTLY POSTED
linux
aws
prometheus
terraform
github
git
+13
Platform Engineer - BristolReference Code: 89353-en_GBContract Type: PermanentProfessional Communities: Software Engineering
About the job you’re considering
Are you a passionate Platform Engineer eager to make a difference? We invite you to be part of our agile team, helping public sector clients build and continuously improve digital services using the best open-source software.
We are looking for experienced Platform Engineers, who are ready to roll up their sleeves and dive into challenges. Quick learners who thrive on innovation and problem-solving and individuals who value collaboration and are excited to share ideas.
You’ll be part of a strong, established community of digital specialists. Together, you will share your ideas, innovate and grow. Our team of engineers supports each other to deliver and develop professionally and you’ll get to work alongside amazing people in one of the best cultures you can find.
Hybrid working:  The places that you work from day to day will vary according to your role, your needs, and those of the business; it will be a blend of Company offices, client sites, and your home; noting that you will be unable to work at home 100% of the time.
Your role
You’ll undertake cross-functional engineering projects, working in small teams with other Engineers in different knowledge spheres, building supportable, sustainable and reliable services.
You’ll work with great technology – the UK Government has an IT strategy based on user centric design, modern open source technology, continuous integration/delivery and modern software architectures such as microservices and cloud technology to name but a few.
You’ll help us achieve our vision – the chance to make a real difference at the heart of the digital transformation within the UK Public Sector
You’ll get opportunities to develop and progress – working on large scale, technically challenging projects. You will be constantly learning through modern learning environments.
You can bring your whole self to work. At Capgemini, striving for equity, diversity and inclusion is part of everyday life, and will be part of your working reality. We have built an inclusive and welcoming environment, for everyone.
Your Security Clearance
Baseline Personnel Security Standard (BPSS)
To be successfully appointed to this role you will need to undergo Baseline Personnel Security Standard checks. 
There are certain criteria and checks required for BPSS, and throughout the recruitment process, you will be asked questions about your security clearance eligibility such as, but not limited to, country of residence and nationality.
In addition to BPSS, you will also need SC (Security Check) Clearance or to be eligible for this level of clearance (by being a UK resident for at least 5 years and not having left the country for more than 28 consecutive days during this period).
Your skills and experience
Are you a Platform/DevOps Engineer with an appetite to extend your knowledge and apply new skills rapidly on challenging projects?  Do you have a broad range of experience in modern open source stacks? Don’t worry if you haven’t got strong experience in every skill listed below. If you can demonstrate a good selection of these skills and a passion for developing exciting new technical skills, then we’d really like to talk to you.
Essential skills:
•    Experience debugging complex, multi-server service in a high availability production environment.
•    Good communication skills both written and verbal, with the ability to convey complex technical concepts clearly and concisely to both internal and external stakeholders.
•    Sound knowledge of Linux, Docker and Kubernetes.
•    Experience of building infrastructure in AWS using Terraform.
•    Experience in automation tooling such as Ansible.
•    Provide technical leadership in either a project sense and/or to more junior members of your team.
•    Strong understanding of CI environments and how they facilitate the automation process with other orchestration tools.
Desirable skills:
•    Core Networking & Loadbalancing.
•    Jenkins / Concourse / Gitlab CI.
•    Git / GitHub / GitLab.
•    Graphite / Grafana / ELK / Prometheus.
•    SQL / NO-SQL DB’s.
•    HashiCorp (Packer / Terraform / Vault / Consul).
What does “Get the future you want” mean for you
You’d be joining an accredited Great Place to work for Wellbeing in 2023. Employee wellbeing is vitally important to us as an organisation.  We see a healthy and happy workforce a critical component for us to achieve our organisational ambitions. To help support wellbeing we have trained ‘Mental Health Champions’ across each of our business areas, and we have invested in wellbeing apps such as Thrive and Peppy.
You will reimagine what’s possible: creating value for the world’s leading organisations through technology to build a sustainable, more inclusive future. You will work with a range of clients all with a unique set of business, technological and societal ambitions, which will make a real impact across the UK.
You will be empowered to explore, innovate, and progress. You will benefit from Capgemini’s ‘learning for life’ mindset, meaning you will have countless training and development opportunities from thinktanks to hackathons, and access to 250,000 courses with numerous external certifications from AWS, Microsoft, Harvard ManageMentor, Cybersecurity qualifications and much more.
You’ll be bringing your unique skills and perspectives to the team, inspiring and taking inspiration from your teammates as you unlock value in everything you do. You’ll be joining a professional community of experts, who have got your back and will support you, every step of the way.
Why you should consider Capgemini
Growing clients’ businesses while building a more sustainable, more inclusive future is a tough ask.  But when you join Capgemini, you join a thriving company and become part of a diverse collective of free-thinkers, entrepreneurs and industry experts.
A powerful source of energy that drives us all to find new ways technology can help us reimagine what’s possible.  It’s why, together, we seek out opportunities that will transform the world’s leading businesses. And it’s how you’ll gain the experiences and connections you need to shape your future.   By learning from each other every day, sharing knowledge and always pushing yourself to do better, you’ll build the skills you want. And you’ll use them to help our clients leverage technology to grow their business and give innovation that human touch the world needs. So, it might not always be easy, but making the world a better place rarely is.
About Capgemini
Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of €22.5 billion.
DataOps Engineer - London
Capgemini
London
Hybrid
Mid
Private salary
RECENTLY POSTED
processing-js
aws
prometheus
terraform
ansible
grafana
+5
DataOps Engineer - LondonReference Code: -en_GBContract Type: PermanentProfessional Communities: Software Engineering
The Job You’re Considering
The Cloud Data Platforms team is part of the Insights and Data Global Practice and has seen strong growth and continued success across a variety of projects and sectors.  Cloud Data Platforms is the home of the Data Engineers, Platform Engineers, Solutions Architects and Business Analysts who are focused on driving our customers digital and data transformation journey using the modern cloud platforms. We specialise on using the latest frameworks, reference architectures and technologies using AWS, Azure and GCP.
Hybrid working: The places that you work from day to day will vary according to your role, your needs, and those of the business; it will be a blend of Company offices, client sites, and your home; noting that you will be unable to work at home 100% of the time.
If you are successfully offered this position, you will go through a series of pre-employment checks, including: identity, nationality (single or dual) or immigration status, employment history going back 3 continuous years, and unspent criminal record check (known as Disclosure and Barring Service)
Your Role
The Data Ops Engineer role focuses on designing, building, automating, and orchestrating data pipelines and applications within containerised environments, primarily Kubernetes. This role bridges the gap between traditional cloud data engineering and DevOps, emphasising automation and continuous delivery of data solutions. Your work will be to:
Designing, building, automating and orchestrating data pipelines using tools such as Airflow, Prefect, or Dagster.
Containerising data applications using Docker and deploying them to Container Platforms (EKS, AKS and Kubernetes).
Implementing and managing CI/CD pipelines for data applications.
Implementing and managing comprehensive monitoring and observability solutions using tools like Grafana, Prometheus, and other non-native monitoring tools, ensuring data quality across the entire data flow.
Working with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible) to provision and manage data infrastructure within pre-existing platforms.
Optimising data processing for performance and scalability.
You can bring your whole self to work. At Capgemini equity, diversity and inclusion is part of everyday life, and will be part of your working reality. We have built an inclusive and welcoming environment, for everyone.
Your Skills and Experience
Proficiency in data pipeline orchestration tools (e.g., Airflow, Prefect, Dagster).
Extensive experience with Docker and Kubernetes.
Proficiency in CI/CD principles and tools.
Familiarity with open-source data tools (e.g., Spark, Kafka, PostgreSQL).
Competency understanding of IaC concepts (e.g., Terraform, Ansible).
Understanding of data architecture principles.
Experience with monitoring and observability tools like Grafana and Prometheus.
Your Security Clearance
To be successfully appointed to this role, it is a requirement to obtain Security Check (SC) clearance. 
To obtain SC clearance, the successful applicant must have resided continuously within the United Kingdom for the last 5 years, along with other criteria and requirements.
Throughout the recruitment process, you will be asked questions about your security clearance eligibility such as, but not limited to, country of residence and nationality.
Some posts are restricted to sole UK Nationals for security reasons; therefore, you may be asked about your citizenship in the application process.
What does ‘Get The Future You Want’ mean for you?
You will be encouraged to have a positive work-life balance.  Our hybrid-first way of working means we embed hybrid working in all that we do and make flexible working arrangements the day-to-day reality for our people.  All UK employees are eligible to request flexible working arrangements.
You will be empowered to explore, innovate, and progress. You will benefit from Capgemini’s ‘learning for life’ mindset, meaning you will have countless training and development opportunities from thinktanks to hackathons, and access to 250,000 courses with numerous external certifications from AWS, Microsoft, Harvard ManageMentor, Cybersecurity qualifications and much more.
Why you should consider Capgemini
Growing clients’ businesses while building a more sustainable, more inclusive future is a tough ask. When you join Capgemini, you’ll join a thriving company and become part of a collective of free-thinkers, entrepreneurs and industry experts. We find new ways technology can help us reimagine what’s possible. It’s why, together, we seek out opportunities that will transform the world’s leading businesses, and it’s how you’ll gain the experiences and connections you need to shape your future. By learning from each other every day, sharing knowledge, and always pushing yourself to do better, you’ll build the skills you want. You’ll use your skills to help our clients leverage technology to innovate and grow their business. So, it might not always be easy, but making the world a better place rarely is.
About Capgemini
Capgemini is a global business and technology transformation partner, helping organisations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fuelled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2024 global revenues of €22.1 billion.
Site Reliability Engineer III - Credit Risk
J.P. MORGAN-1
Glasgow
Hybrid
Mid
Private salary
RECENTLY POSTED
android
prometheus
dot-net
terraform
grafana
kubernetes
+8
Job Description
There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world’s most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Cross Risk Technology team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Supports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
Formal training or certification on site reliability culture and principles concepts and proficient applied experience
Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Ability to initiate and implement ideas to solve business problems
Preferred qualifications, capabilities, and skills
Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
Familiarity with troubleshooting common networking technologies and issues
Familiarity to code in at least one programming language such as Python, Java/Spring Boot, and .Net
About Us
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
About The Team
Our professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we’re setting our businesses, clients, customers and employees up for success.
Java Lead Software Engineer
J.P. MORGAN-1
Glasgow
Hybrid
Leader
Private salary
RECENTLY POSTED
java
aws
prometheus
spring-boot
terraform
grafana
+9
Job Description
We have an opportunity to impact your career and provide an adventure where you can push the limits of what’s possible.
As a Lead Software Engineer at JPMorgan Chase within the Asset Management Position/Accounting & Holding world, you play a crucial role in an agile Site Reliability Engineering team. Your focus is on applying software engineering principles to automate operations and ensure the reliability, availability, and performance of software systems. Key responsibilities include instrumenting applications to create telemetry data, enhancing observability and monitoring, automating manual operational tasks, troubleshooting incidents, managing infrastructure, performing capacity planning, and conducting blameless post-incident reviews to improve system design and reduce downtime. This role bridges the gap between development and operations, requiring both strong software coding skills and operational expertise.
Job responsibilities
Seek continuous improvement of reliability, monitoring, and alerting for our mission-critical microservices.
Instrument Spring boot Moneta apps for telemetric data.
Apply technical expertise and problem-solving methodologies to provide support for multiple Cloud products and services.
Implement & maintain scalable, secure, and efficient cloud environments for AWS platform automating infrastructure deployment, monitoring performance, optimizing costs, ensuring security and compliance, and troubleshooting issues.
Design monitoring and alerting that is customer journey-based and directly proportionate to customer experience, supporting our ‘you build it, you own it’ model. Our alerts must be highly precise, as developer teams are engaged immediately with no triage.
Think outside of the box to eliminate toil and enable controls excellence, automating as much as possible.
Hands on with cutting egde AI/ML tools (copilot/GPT cline) to solve day to day issues.
Contribute to internal tools, including our state-of-the-art framework for SLI and error budget aggregation.
Enhance performance testing, forecasting, and capacity planning framework.
Contribute to chaos engineering framework.
Adds to team culture of diversity, opportunity, inclusion, and respect.
Required qualifications, capabilities, and skills
Formal training or certification in software engineering concepts and proficient advanced experience.
Fluency and Proficiency in Java Spring Boot/Python.
Proven experience in infrastructure engineering for AWS platform/SRE/DevOps or similar.
Knowledge of networking terminology, databases, compute, storage, deployment practices, integration, automation, scaling, resilience, or performance assessments.
Hands-on practical experience delivering system design, application development, testing, and operational stability.
Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.), container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.).
Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
Experienced in SRE/DevOps practices (SLA/SLOs, error budgets, MTTR, MTTD).
Practical cloud native experience.
Preferred qualifications, capabilities, and skills
AWS Certifications (e.g. Solutions Architect Associate).
AL/ML knowledge.
About Us
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
About The Team
J.P. Morgan Asset & Wealth Management delivers industry-leading investment management and private banking solutions. Asset Management provides individuals, advisors and institutions with strategies and expertise that span the full spectrum of asset classes through our global network of investment professionals. Wealth Management helps individuals, families and foundations take a more intentional approach to their wealth or finances to better define, focus and realize their goals.
Cloud Software Engineer III
J.P. MORGAN-1
Dorset
Hybrid
Mid
Private salary
RECENTLY POSTED
linux
android
windows
aws
prometheus
terraform
+8
Job Description
We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.
As a Software Engineer III at JPMorgan Chase within the Cloud Foundational Services Team, part of the Site Reliability Engineering Team, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you’ll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase’s business and relevant technologies.
Job responsibilities
Executes small to medium projects independently with initial direction and eventually graduates to designing and delivering projects by yourself
Leverages technology to solve business problems by writing high quality, maintainable, and robust code following best practices in software engineering
Participates in triaging, examining, diagnosing, and resolving incidents and work with others to solve problems at their root
Recognizes the toil within your role and proactively works towards eliminating it through either systems engineering or updating application code
Understands observability patterns and strives to implement and improve service level indicators, objectives monitoring, and alerting solutions for optimal transparency and analysis
Adds to team culture of diversity, opportunity, inclusion, and respect
Required qualifications, capabilities, and skills
Formal training or certification on Software Engineering concepts and proficient applied experience
Ability to code in at least one programming language, preferably Python or Go
Experience maintaining a Cloud-base infrastructure
Familiar with site reliability concepts, principles, and practices
Familiar with observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Familiarity with containers or a common Server OS such as Linux and Windows
Emerging knowledge of software, applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Emerging knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Emerging knowledge of common networking technologies
Ability to work in a large, collaborative team and demonstrates the willingness to vocalize ideas with peers and managers
Understanding of how to prioritize and adjust work plans to adapt to changes in assigned responsibilities and projects
Eagerness to participate in learning opportunities to enhance one’s effectiveness in executing day-to-day project activities
Ability to demonstrate and apply existing and new system processes, methodologies, and skills to contribute to the development of systems
Preferred qualifications, capabilities, and skills
General knowledge of financial services industry
Experience working with Amazon Web Services (AWS)
About Us
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
About The Team
Our professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we’re setting our businesses, clients, customers and employees up for success.
Site Reliability Engineer III
J.P. MORGAN-1
Glasgow
Hybrid
Mid
Private salary
RECENTLY POSTED
android
prometheus
terraform
grafana
kubernetes
python
+9
Job Description
There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world’s most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within Corporate Technology you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Supports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
Formal training or certification on SRE concepts and proficient applied experience
Proficient in at least one programming language such as Python, Java/Spring Boot
Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
Familiarity with troubleshooting common networking technologies and issues
Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Ability to initiate and implement ideas to solve business problems
Preferred qualifications, capabilities, and skills
Exposure to SRE/Devops practices (SLA/SLOs, error budgets, MTTR, MTTD).
Experience with containers and orchestration (Dockers, Kubernetes).
Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Ability to initiate and implement ideas to solve business problems
Exposure to AI/Automation technologies that improve operations.
Good Understanding of Java and Sql.
About Us
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
About The Team
Our professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we’re setting our businesses, clients, customers and employees up for success.
Software Engineer II - Order Management System & Infrastructure
J.P. MORGAN-1
Glasgow
Hybrid
Mid
Private salary
RECENTLY POSTED
linux
react
prometheus
grafana
kubernetes
python
+7
Job Description
As a Software Engineer within Asset & Wealth Management at JPMorgan Chase, you will contribute to the development, deployment, and support of critical applications. In addition to your primary software engineering responsibilities, you will gain exposure to site reliability engineering (SRE) practices, supporting the stability and performance of our systems. This role is ideal for candidates with a strong foundation in software development who are eager to broaden their skills in reliability and operational excellence.
Key Responsibilities
Software Development:
Design, develop, test, and maintain applications using Java, Python, or React.
Participate in code reviews and collaborate with senior engineers to deliver high-quality software solutions.
Database Management:
Write and optimize SQL queries.
Support database-related tasks, including troubleshooting and performance tuning, primarily with Oracle SQL.
Linux Operations:
Utilize Linux command-line tools for development, deployment, and basic troubleshooting.
Site Reliability Support:
Assist in monitoring application health and performance using tools such as Grafana, Prometheus, or Splunk.
Participate in incident response activities, including data collection and documentation.
Contribute to basic automation and scripting tasks to improve operational efficiency.
Collaboration and Learning:
Work closely with cross-functional teams, including SREs and senior software engineers.
Proactively seek opportunities to learn new technologies and best practices in both software engineering and site reliability.
Required Qualifications, Capabilities, and Skills
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Proficiency in at least one programming language (Java, Python, or React).
Basic experience with SQL and relational databases, preferably Oracle SQL.
Familiarity with Linux operating systems and command-line tools.
Strong problem-solving skills and attention to detail.
Willingness to learn and apply SRE concepts, including monitoring, automation, and incident response.
Effective communication and teamwork skills.
Preferred Qualifications, Capabilities, and Skills
Exposure to CI/CD tools such as Jenkins or GitLab.
Experience with monitoring and alerting tools (Grafana, Prometheus, Splunk, etc.).
Familiarity with containerization technologies (Docker, Kubernetes).
Experience with scripting or automation (e.g., Bash, Python).
Interest in operational excellence and system reliability.
About Us
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
About The Team
J.P. Morgan Asset & Wealth Management delivers industry-leading investment management and private banking solutions. Asset Management provides individuals, advisors and institutions with strategies and expertise that span the full spectrum of asset classes through our global network of investment professionals. Wealth Management helps individuals, families and foundations take a more intentional approach to their wealth or finances to better define, focus and realize their goals.
Lead Site Reliability Engineer - Vice President
J.P. MORGAN-1
London
Hybrid
Leader
Private salary
RECENTLY POSTED
aws
prometheus
dot-net
spring-boot
terraform
grafana
+11
Job Description
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platforms team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
Job responsibilities
Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
Leads initiatives to improve the reliability and stability of your team’s applications and platforms using data-driven analytics to improve service levels
Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
Documents and shares knowledge within your organization via internal forums and communities of practice
Required qualifications, capabilities, and skills
Formal training or certification on system design concepts and proficient advanced experience
Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
Experience with troubleshooting common networking technologies and issues
Ability to identify and solve problems related to complex data structures and algorithms
Drive to self-educate and evaluate new technology
Ability to teach new programming languages to team members
Preferred qualifications, capabilities, and skills
Experience with AWS and Python
About Us
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
About The Team
Our professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we’re setting our businesses, clients, customers and employees up for success.
MongoDB-Site Reliability
Barclays Bank PLC
London
In office
Mid
Private salary
RECENTLY POSTED
mongodb
prometheus
chef
ansible
grafana
kubernetes
+4
Join our team as a MongoDB Site Reliability Engineer, where you’ll be at the forefront of designing and maintaining robust, high-performance systems that power critical financial services. In this dynamic and fast-paced environment, your role will be essential to ensuring our infrastructure remains resilient, secure, and scalable. You’ll work on automating operations, enhancing system observability, and driving continuous improvements that reduce downtime and improve efficiency. If you’re motivated by solving, multi-layered problems and building systems that perform reliably amid shifting priorities, we encourage you to apply.
To be successful as a MongoDB Site Reliability Engineer, you should have experience with:
Working in Site Reliability Engineering, DevOps, and MongoDB administration in financial services.
Using MongoDB features like replicaset, sharding, backups, performance tuning, and shell scripting.
Writing scripts in Python or Bash to automate tasks and reduce manual work.
Some other highly valued skills may include:
Using Percona, ClusterControl, CI/CD tools, and automation platforms like Ansible or Chef.
Monitoring systems with Prometheus, Grafana, ELK stack, and running containers with Kubernetes.
Building APIs with FastAPI and supporting scalable, high-performance systems.
You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen, strategic thinking and digital and technology, as well as job-specific technical skills.
This role will be based in Knutsford.
Purpose of the role
To effectively monitor and maintain the bank’s critical technology infrastructure and resolve more complex technical issues, whilst minimising disruption to operations.
Accountabilities
Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients. Develop the support model and service offering to improve the service to customers and stakeholders.
Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance.
Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing.
Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes.
Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management.
Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues.
Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency. Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur.
Assistant Vice President Expectations
To advise and influence decision making, contribute to policy development and take responsibility for operational effectiveness. Collaborate closely with other functions/ business divisions.
Lead a team performing complex tasks, using well developed professional knowledge and skills to deliver on work that impacts the whole business function. Set objectives and coach employees in pursuit of those objectives, appraisal of performance relative to objectives and determination of reward outcomes
If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: L - Listen and be authentic, E - Energise and inspire, A - Align across the enterprise, D - Develop others.
OR for an individual contributor, they will lead collaborative assignments and guide team members through structured assignments, identify the need for the inclusion of other areas of specialisation to complete assignments. They will identify new directions for assignments and/ or projects, identifying a combination of cross functional methodologies or practices to meet required outcomes.
Consult on complex issues; providing advice to People Leaders to support the resolution of escalated issues.
Identify ways to mitigate risk and developing new policies/procedures in support of the control and governance agenda.
Take ownership for managing risk and strengthening controls in relation to the work done.
Perform work that is closely related to that of other areas, which requires understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function.
Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategy.
Engage in complex analysis of data from multiple sources of information, internal and external sources such as procedures and practises (in other areas, teams, companies, etc).to solve problems creatively and effectively.
Communicate complex information. ‘Complex’ information could include sensitive information or information that is difficult to communicate because of its content or its audience.
Influence or convince stakeholders to achieve outcomes.
All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave.
Storage Engineer
Sky
Multiple locations
Hybrid
Mid
Private salary
RECENTLY POSTED
linux
windows
aws
prometheus
unity-3d
grafana
+3
We believe in better. And we make it happen.
Better content. Better products. And better careers.
Working in Tech, Product or Data at Sky is about building the next and the new. From broadband to broadcast, streaming to mobile, SkyQ to Sky Glass, we never stand still. We optimise and innovate.
We turn big ideas into the products, content and services millions of people love.
And we do it all right here at Sky.
We are seeking a highly skilled and experienced Senior Storage Engineer to join our dynamic team. The ideal candidate will have deep expertise in a wide range of storage technologies and systems, with the ability to work both independently and collaboratively within a global team environment.
What you’ll do
Deploy and maintain enterprise storage solutions using Dell PowerScale, Unity, and PowerVault.
Configure and manage SAN environments using Brocade or Cisco SAN switches.
Manage a large-scale, complex NAS environment based on PixStor, Ngenea, and Spectrum Scale (GPFS) components.
Develop and maintain automation scripts using Python and Bash to support storage operations and monitoring.
Administer storage systems across Linux and Windows platforms.
Ensure high availability, performance, and scalability of storage infrastructure.
Maintain comprehensive documentation for storage configurations, procedures, and operational standards.
Collaborate with cross-functional teams to support business and application requirements.
Monitor and troubleshoot storage performance issues and participate in incident resolution.
Support backup and archival processes and contribute to capacity planning.
What you’ll bring
Proven experience with Dell PowerScale, Unity, and PowerVault or similar SAN and NAS storage systems.
Hands-on expertise with Brocade or Cisco SAN switches.
Strong scripting skills in Python and Bash.
Solid understanding of Linux and Windows operating systems.
Ability to produce and maintain clear, structured technical documentation.
Strong problem-solving skills and attention to detail.
Desired Technical Skills
Experience with IBM Spectrum Protect (TSM), Spectrum Scale (GPFS), and Spectrum Archive.
Familiarity with NetBackup and tape library management.
Exposure to VMware virtualization technologies.
Understanding of AWS storage services and cloud integration.
Knowledge of monitoring tools such as Zabbix, Prometheus, and Grafana.
Experience with Pixstor and Ngenea products is highly advantageous.
Other Requirements
A collaborative team player with excellent communication skills.
Demonstrated ability to share knowledge and support team learning.
Proactive and eager to learn new technologies and approaches.
Comfortable working in a dynamic, fast-paced environment.
Working from the office two days a week with occasional travel to other sites in the UK
Team overview
Content technology and innovation
Our Content Technology and Innovation team delivers high-quality content to homes, customer devices, businesses and commercial partners across our European markets. With over 2500 colleagues from around the world, we combine our strategic insights, engineering know-how and operational excellence to use the most innovative technologies to create and distribute our award-winning content.
The rewards
There’s one thing people can’t stop talking about when it comes to : the perks. Here’s a taster:
Sky Q, for the TV you love all in one place
The magic of Sky Glass at an exclusive rate
A generous pension package
Private healthcare
Discounted mobile and broadband
A wide range of Sky VIP rewards and experiences
Inclusion & how you’ll work
We are a Disability Confident Employer, and welcome and encourage applications from all candidates. We will look to ensure a fair and consistent experience for all, and will make reasonable adjustments to support you where appropriate. Please flag any adjustments you need to your recruiter as early as you can.
We’ve embraced hybrid working and split our time between unique office spaces and the convenience of working from home. You’ll find out more about what hybrid working looks like for your role later on in the recruitment process.
Your office space:
Osterley :
Our Osterley Campus is a 10-minute walk from Syon Lane train station. Or you can hop on one of our free shuttle buses that run to and from Osterley, Gunnersbury, Ealing Broadway and South Ealing tube stations. There’s also plenty of bike shelters and showers.
On campus, you’ll find 13 subsidised restaurants, cafes, and a Waitrose. You can keep in shape at our subsidised gym, catch the latest shows and movies at our cinema, get your car washed and even get pampered at our beauty salon.
We’d love to hear from you
Inventive, forward-thinking minds come together to work in Tech, Product and Data at Sky. It’s a place where you can explore what if, how far, and what next.
But better doesn’t stop at what we do, it’s how we do it, too. We embrace each other’s differences. We support our community and contribute to a sustainable future for our business and the planet.
If you believe in better, we’ll back you all the way.
Just so you know: if your application is successful, we’ll ask you to complete a criminal record check. And depending on the role you have applied for and the nature of any convictions you may have, we might have to withdraw the offer.
SC cleared SRE - Openshift
fortice
Wokingham
Remote or hybrid
Mid
£450/day
RECENTLY POSTED
linux
prometheus
grafana
helm
python
bash
+1
SRE/SENIOR SRE/Site Reliability Engineer
Location: Wokingham (Reading) | Hybrid - 60% remote and 40% onsite
Duration: 30/01/2026 - possible extension
MUST BE PAYE THROUGH UMBRELLA
We are heading up a recruitment drive for a global consultancy that require a SC cleared SRE to join them on a major government project that’s based remotely.
OpenShift Experience is a MUST
Collaborate with Agile teams to automate deployment, monitoring, and infrastructure management.
Ensure platform and business application reliability and performance against strict SLAs and KPIs.
Implement and maintain cloud-native observability stacks (Prometheus, Grafana, Loki, Tempo).
Develop and maintain Infrastructure as Code (IaC) using tools like Kustomize or Helm.
Manage CI/CD pipelines using Tekton and ArgoCD.
Support and troubleshoot OpenShift Operators (ServiceMesh, ODF, ACS, ACM, AMQ).
Conduct security reviews and implement controls aligned with national infrastructure standards.
Mentor junior engineers and promote SRE best practices.
Collaborate with vendors and IT teams for incident resolution and platform improvements.
Required Skills:
Strong communication skills (written and verbal).
Experience in remote team collaboration.
Deep expertise in OpenShift/Kubernetes and RedHat Linux.
Proficiency in Scripting (Bash, Python) and templating (Helm, Kustomize).
Experience with CI/CD automation and IaC strategies.
Security-first mindset with experience in regulated environments.
Experience with VMware vSphere virtualization
Senior Backend Engineer (Platform & Security)
Ventula Consulting
London
Hybrid
Senior
£110k - £140k
RECENTLY POSTED
processing-js
aws
prometheus
redis
grafana
kafka
+4
We are seeking a deeply technical and security-minded Senior Backend Engineer to join a newly-founded, high-impact AI joint venture. Backed by five of the world’s leading telecommunications giants, our mission is to restore trust in global voice communication. We are building a new category of network-native B2C security, moving from reactive, in-call analysis to proactive, pre-call verification that stops fraud before it starts.
This is not a typical Back End role. You will be the Stability & Monetization Owner, the foundational engineer responsible for two of the most critical pillars of our platform: our monetization engine and our Zero Trust security architecture. You will build the systems that protect our users’ most sensitive data, including PII and biometric voiceprints, while navigating complex global compliance laws. You will also build the metering and entitlements engine that powers our entire freemium business model.
This position offers a unique opportunity to build a globally-scalable, highly secure, and compliant platform from the ground up. If you believe security and privacy are non-negotiable product features and you are excited by the challenge of building the commercial and trust-based foundation of a new product category, this role is for you.
Key Responsibilities Monetization & Platform Architecture
Design, build, and own the end-to-end metering and entitlements system that powers our freemium business model, targeting a 30% premium conversion rate.
Architect and manage our core database solutions (eg, PostgreSQL for user data, Redis for high-speed caching), ensuring high availability, scalability, and security for call logs and user-profiles.
Develop core, non-feature-specific Back End services such as user account management, settings, and asynchronous task processing via our AI Service Bus (eg, Apache Kafka).
Security & Trust Implementation
Serve as the primary engineering owner for implementing our application-layer Zero Trust architecture, working hand-in-glove with our DevOps and InfoSec partners.
Implement robust authentication and authorization services (eg, Role-Based Access Control) for all internal and external APIs, ensuring strict adherence to the Principle of Least Privilege (PoLP).
Implement industry-leading data security practices for handling highly sensitive PII and biometric data (voiceprints), ensuring end-to-end encryption for all data in transit (mTLS) and at rest (AES-256).
Compliance-as-Code & Observability
Collaborate with legal and product teams to translate complex global privacy laws (eg, Germany’s GDPR, South Korea’s PIPA, United Arab Emirates’s PDPL) into concrete engineering logic.
Build and maintain the Policy Engine that enables our platform to be jurisdiction-aware, dynamically managing data handling, consent flows, and feature-gating based on user location.
Partner with the DevOps Engineer to create comprehensive logging, monitoring, and analytics systems (eg, using Prometheus, Grafana, OpenTelemetry) to provide deep visibility into platform health, security events, and business KPIs.
Required Qualifications Education & Experience
Bachelor’s degree in Computer Science or a related technical field.
5+ years of hands-on experience in Back End engineering, building and maintaining high-availability, scalable services in a production environment.
Technical & Platform Skills
Deep proficiency and hands-on production experience in Go (Golang our stack). Go is the required language for this role as we are standardizing it for our core platform and security services.
Strong architectural knowledge of database solutions, including relational (eg, PostgreSQL) and caching (eg, Redis) systems.
Demonstrated experience building services on a major cloud platform (eg, AWS, Azure, GCP).
Strong understanding of distributed systems, microservices architecture, and API design.
Security & Compliance
A deep and demonstrable security-first mindset.
Hands-on experience implementing security best practices at the application layer, including authentication (eg, OAuth 2.0, JWT), authorization (RBAC), and data encryption.
Experience in securely handling and storing sensitive user data (PII).
This is a permanent position with hybrid working of two days a week in the central London office and the rest WFH. The salary is very much Dependent on experience with a guide between £110k-£140K basic + package.
AWS Data Engineer - AirFlow
Square One Resources
Manchester
Hybrid
Mid
£400/day
RECENTLY POSTED
aws
airflow
prometheus
terraform
github
grafana
+6
Job Title: Airflow/AWS Data Engineer
Location: Manchester Area (3 days per week in the office)
Rate: Up to 400 per day inside IR35
Start Date: 03/11/2025
Contract Length: Until 31st December 2025
Job Type: Contract
Company Introduction:
An exciting opportunity has become available with one of our sector-leading financial services clients. They are seeking a talented AWS DevOps/Data Engineer to join their growing data engineering function. This role will play a key part in designing, deploying, and maintaining modern cloud infrastructure and data pipelines, with a focus on Airflow, AWS, and data platform automation.
Key Responsibilities:
Deploy and manage cloud infrastructure across Astronomer Airflow and AccelData environments.
Facilitate integration between vendor products and core systems, including data lakes, storage, and compute services.
Establish and enforce best practices for cloud security, scalability, and performance.
Configure and maintain vendor product deployments, ensuring reliability and optimized performance.
Ensure high availability and fault tolerance for Airflow clusters.
Implement and manage monitoring, alerting, and logging solutions for Airflow and related components.
Perform upgrades, patches, and version management for platform components.
Oversee capacity planning and resource optimization for Airflow workers and AWS resources.
Manage integrations with source control systems (GitHub, GitLab) and CI/CD pipelines.
Collaborate with AWS teams and internal stakeholders for pipeline scalability and optimization.
Design and implement process improvements, including automation, data delivery optimization, and infrastructure re-design.
Develop ETL pipelines and data workflows using AWS and SQL technologies.
Partner with cross-functional teams (product, design, and leadership) to resolve technical issues and enhance platform capabilities.
Build analytical tools and dashboards to leverage data pipelines for actionable business insights.
Key Requirements:
Proven experience as an AWS DevOps Engineer or Data Engineer in complex cloud environments.
Strong hands-on expertise with AWS services (EC2, S3, Lambda, RDS, IAM, CloudWatch, etc.).
Demonstrated experience with Airflow (Astronomer) setup, orchestration, and optimization.
Proficiency in infrastructure as code (IaC) tools such as Terraform or CloudFormation.
Experience with CI/CD pipelines and tools like Jenkins, GitHub Actions, or GitLab CI.
Solid understanding of containerization technologies (Docker, Kubernetes).
Working knowledge of Python and SQL for automation and data pipeline development.
Familiarity with monitoring and observability tools (Grafana, Prometheus, CloudWatch).
Strong grasp of data architecture principles and ETL design patterns.
Financial services or regulated industry experience (desirable).
Performance Test Engineer - Python, Locust, automation
PCR Digital
Not Specified
Fully remote
Mid
£325/day - £350/day
python
prometheus
terraform
grafana
kubernetes
docker
+5
Performance Test Engineer (Python Automation for large-scale, low-latency, distributed systems****Remote Europe)
Location: Remote (Europe & UK only)
Full-Time 6-Month Contract Start Date: within 1-2 weeks
350pd IR35 Outside TBC
We’re seeking a hands-on Performance Test Engineer with Strong Python and automation expertise to design, build, and execute the performance testing strategy for a high-scale, ad-serving platform built on Akka-based Java microservices. Required to build automated load frameworks using Locust (Python).
You’ll work closely with developers and DevOps engineers to simulate realistic traffic at scale, ensure sub-50ms latency under millions of concurrent users, and drive system optimizations across cloud infrastructure and code. This is a technical, high-impact role ideal for someone passionate about distributed systems performance, automation, and data-driven tuning.
Profile:
3-5+ years of performance engineering for large-scale, low-latency, distributed systems.
Proven success meeting p95/p99 latency SLAs under high concurrency (millions of RPS).
Strong Python and automation expertise - able to design reusable, scalable test frameworks. Experience with distributed load testing and synthetic traffic modeling in the cloud.
Analytical, structured, and effective communicator with strong documentation and collaboration skills.
Based in EU or UK with English (C1 or higher).
Nice to have: Java, Bash scripting, Terraform.
Key Responsibilities: Define and execute comprehensive performance test plans (load, stress, spike, soak, scalability, failover). Model real-world streaming traffic patterns (burstiness, fan-out, cache behavior, cold-start, geo distribution). Build automated load frameworks using Locust (Python) or JMeter, with data parameterization and correlation. Manage distributed load generation (containers, cloud workers) to simulate millions of concurrent users. Integrate performance metrics from CloudWatch, Prometheus, Grafana, and OpenTelemetry to analyze system bottlenecks. Develop SLA/SLO dashboards and integrate performance gates into CI/CD pipelines. Collaborate with DevOps and developers to tune JVM, Akka, thread pools, GC, caching, autoscaling, and database performance. Document test approaches, scenarios, results, and provide clear, actionable tuning recommendations.
Tech Stack: Load Tools: Locust (Python), JMeter; k6 or Gatling (nice to have). Languages: Python, Bash, Java (Maven/Gradle, JVM tuning basics). Infrastructure: Docker, Kubernetes, Terraform. Observability: CloudWatch, Prometheus, Grafana, OpenTelemetry. Architecture: Akka-based asynchronous Java microservices.
Logistics: Start date: 17 November 2025. Duration: 6 months (extension possible). Employment type: Full-Time (Freelance allowed). Location: Remote (Europe).
If you’re passionate about performance engineering and love optimizing systems that operate at global scale, we’d love to hear from you. Apply now and be part of an agile, innovative European tech team.
Everybody is welcome
Diversity and Inclusion Statement. PCR Digital
“At PCR Digital, we are committed to ensuring that diversity, equity and inclusion play a role at all stages of our recruitment - it is important to us that our own company culture and the culture of our network is as varied and supportive as possible. We love people (it’s why we do what we do), so, regardless of background, we welcome you to work with us or apply to any of our jobs if you feel that they are right for you.”
Platform Engineer
Damia Group Ltd
Blandford Forum
Hybrid
Mid
£500/day - £600/day
linux
windows
prometheus
terraform
git
ansible
+7
*SC Cleared Platform Engineer - 6 month initial contract - Blandford, Dorset (3/4 days per week onsite) - £500-600p/d OUTSIDE IR35*
Please note: Due to restrictions the candidate must be a UK National and hold SC clearance.
We are seeking a dynamic and experienced Platform Engineer to join our team. The ideal candidate will lead the design and implementation of CI/CD pipelines, IaC configuration, monitoring systems, repository management, automation, and other critical platform components. The Platform Engineer will ensure seamless integration of our offerings with the development team’s requirements. While the role may encompass a broad scope, including deployment automation and CI/CD pipeline configuration initially, it will gradually shift towards supporting the platform’s maintainability, supportability, security, and scalability/performance.
Key Responsibilities:
Develop IaC to support scalable and resilient application deployments.
Lead the design and construction of CI/CD pipelines to streamline the deployment process and enhance development efficiency.
Support on-premise platforms, ensuring consistency, reliability and maintainability through automation and Infrastructure as Code.
Implement robust monitoring solutions to ensure the health and performance of platform services.
Manage repository systems and version control to facilitate collaboration and code management.
Drive automation initiatives to optimize workflow processes and reduce manual intervention.
Collaborate with the Platform team to ensure alignment between platform capabilities and development team requirements.
Provide guidance and oversight to the development team on best practices for utilising platform offerings effectively.
Support initiatives related to platform maintainability, supportability, security, and scalability/performance as the platform evolves.
Monitor industry trends and emerging technologies to continually enhance platform capabilities and practices.
Qualifications and Experience:
Strong proficiency in the following DevOps/IaC tools - Ansible, Terraform, Packer and Vault.
Expertise with Scripting languages such as Bash or Powershell, as well as good understanding of Linux and Windows operating systems.
Experience of containers and proficiency with Docker/Podman.
Strong expertise in designing and implementing CI/CD pipelines using tools such as Jenkins, GitLab CI/CD, or similar.
Experience with monitoring tools such as Prometheus, Grafana, or similar for infrastructure and application monitoring.
Knowledge of automating tasks with virtual systems such as VMware/ProxMox.
Solid understanding of version control systems, preferably Git, and repository management practices.
Proven track record of driving automation initiatives to improve operational efficiency and reliability.
Excellent communication and collaboration skills, with the ability to interact effectively with cross-functional teams.
Strong problem-solving skills and the ability to thrive in a fast-paced, dynamic environment.
Damia Group Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept our Data Protection Policy which can be found on our website.
Please note that no terminology in this advert is intended to discriminate on the grounds of a person’s gender, marital status, race, religion, colour, age, disability or sexual orientation. Every candidate will be assessed only in accordance with their merits, qualifications and ability to perform the duties of the job.
Damia Group is acting as an Employment Business in relation to this vacancy and in accordance to Conduct Regulations 2003.
DevOps Engineer
Amtis Professional Ltd
Burton-on-Trent
Remote or hybrid
Mid
£60k - £65k
aws
prometheus
aws-cloudformation
terraform
github
git
+10
DevOps Engineer - Remote -1 Day P/W Burton On Trent - £60,000 - £65,000 + Benefits
AWS, Azure, CI/CD, Terraform, Git, Python, ARM, Kubernetes
Role Overview
We are seeking a skilled DevOps Engineer to design, implement and maintain robust cloud infrastructure solutions across AWS and Azure platforms. This role plays a pivotal part in enabling continuous integration and delivery, ensuring system reliability, embedding security best practices, and actively contributing to team development through knowledge sharing.
Key Responsibilities
Design, deploy and manage scalable, secure infrastructure in AWS and Azure
Build and maintain CI/CD pipelines using tools such as Azure DevOps
Implement and manage monitoring, alerting and logging systems (e.g. Datadog, Logic Monitor, SolarWinds)
Automate infrastructure provisioning using Infrastructure as Code (IaC) tools such as Terraform
Ensure compliance with security policies; manage IAM, PIM and RBAC access controls
Respond to incidents and contribute to root cause analysis and post-mortem reviews
Create and maintain comprehensive documentation and runbooks
Collaborate with cross-functional teams to align DevOps practices with wider project goals
Ensure adherence to regulatory standards including CQC, GDP, NMC, GPhC, and ICO relevant to the role
Remain fully informed of responsibilities relating to Infection Prevention and Control
Technical Skills & Experience
Cloud Platforms - Hands-on experience with AWS and Azure. Any relevant certifications (e.g. AWS Architect, AZ-104, AZ-305)
DevOps & CI/CD - Strong grasp of DevOps principles. Experience with Azure DevOps, GitHub Actions, Jenkins. AZ-400 certification desirable
Containerisation - Experience with AKS/EKS, Proficiency in AWS CloudFormation or ARM templates
Scripting & Automation - Proficient in PowerShell, Bash, or Python
Infrastructure as Code (IaC) - Hands-on experience with Terraform, Bicep, or ARM Certified: Terraform Associate preferred
Monitoring & Observability - Familiarity with tools like Azure Monitor, AWS CloudWatch, Prometheus, Grafana
Security & Compliance - Strong understanding of IAM, cloud security, compliance frameworks
Cloud Platform Expertise: Proven experience with AWS and Azure cloud platforms.AWS Certified Solutions Architect - Associate or Professional, Microsoft Certified: Azure Administrator Associate (AZ-104), or Microsoft Certified: Azure Solutions Architect Expert (AZ-305)
DevOps & CI/CD: Strong understanding of DevOps principles and hands-on experience with CI/CD tools like Azure DevOps, GitHub Actions, or Jenkins.
Azure Kubernetes Service: Proven experience designing and managing AKS clusters
Containerization: Docker, Kubernetes, Helm charts, and container orchestration
Azure DevOps: Advanced pipeline configuration for container builds and deployments
Additional certification: Microsoft Certified: Azure Kubernetes Service (AKS) Specialist or similar container-focused Azure cert
Azure Monitor for containers: Implement comprehensive monitoring for AKS workloads
Azure Key Vault integration: Secure secrets management for containerized applications
Azure Policy for Kubernetes: Implement governance and compliance for container workloads
Azure Arc: If relevant, managing hybrid/multi-cluster scenarios
Security & Compliance: Solid grasp of cloud security best practices, identity and access management, and compliance frameworks.
Collaboration & Mentorship: Excellent communication skills with a passion for mentoring, documentation, and enabling others through knowledge sharing.
For immediate consideration apply now!
Observability Developer/Engineer
VIQU IT
London
Hybrid
Mid
£40k - £75k
prometheus
grafana
python
splunk
nodejs
jira
+2
Job Title: Observability Developer / Engineer
Location: Hybrid (UK, with travel as required)
Employment Type: Full-time
This role is with Morela please respond to (url removed) for further informaiton
Do you want to be part of something special? Morela is proud to represent our exclusive client, a fast-growing start-up transforming Service Operations. Led by industry leaders with a proven track record of building and scaling successful businesses, this company is redefining how enterprises monitor, manage, and optimise IT operations. This is your chance to join a team shaping the future of observability and operational intelligence from the ground up.
We are seeking a skilled Observability Developer to design, build, and optimise observability solutions that help enterprise clients gain actionable insights from their logs, metrics, traces, and events. In this role, you will reduce noise, improve reliability, and accelerate innovation by integrating monitoring platforms, ITSM tools, and AIOps engines while embedding observability best practices into delivery pipelines.
Key Responsibilities:
Design and implement observability pipelines across logs, metrics, events, and traces
Build integrations and automation between monitoring/alerting platforms, ITSM tools, and AIOps engines
Optimise alerting strategies to reduce noise and improve signal quality
Develop dashboards, visualisations, and reports for technical and business stakeholders
Deploy observability solutions in cloud and hybrid environments
Contribute to observability strategy and best practices within the Service Operations Framework
Collaborate with development, operations, and SRE teams to embed observability into the full delivery lifecycle
Skills & Experience:
Strong background in observability, monitoring, and event management
Hands-on experience with platforms such as Dynatrace, Datadog, AppDynamics, Splunk, Prometheus, Grafana, New Relic, or Elastic
Experience building integrations and automation using APIs, Python, Node.js, Go, or scripting
Familiarity with AIOps platforms (BigPanda, Moogsoft, etc.)
Knowledge of ITSM / incident management processes and tools (Halo ITSM, ServiceNow, Jira Service Management)
Cloud experience (AWS, Azure, GCP) and deploying observability tools in cloud-native environments
Understanding of OpenTelemetry and modern observability standards
Strong problem-solving skills and ability to work in a fast-paced start-up or consulting environment
Why Join:
Work with our exclusive client, a high-growth start-up backed by proven Service Operations leaders
Work on cutting-edge projects across multiple industries
Shape both client outcomes and the company s frameworks and offerings
Thrive in a collaborative culture where ideas are valued, careers grow quickly, and impact is immediate
Sounds great right? Don’t hesitate to apply today.
Cloud Services Engineer
Hays Technology
Manchester
Hybrid
Mid
£55k - £65k
prometheus
terraform
powershell
microsoft-azure
Prestigious opportunity for a Cloud Services Engineer with a pioneering market-leading organisation based in Manchester with hybrid working.
In this role, you will be responsible for designing, implementing, and maintaining cloud infrastructure and services that support our business operations. You will work across teams to ensure secure, scalable, and cost-effective cloud solutions, primarily within Microsoft Azure platforms.
Key Responsibilities:
Design, deploy, and manage cloud-based infrastructure and services.
Monitor system performance, availability, and security across cloud environments.
Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC) tools (e.g., Terraform, ARM, CloudFormation).
Collaborate with development and DevOps teams to support CI/CD pipelines and cloud-native applications.
Implement and maintain backup, disaster recovery, and high availability strategies.
Ensure compliance with security policies and industry best practices.
Troubleshoot and resolve cloud-related issues and incidents.
Maintain documentation and provide technical support to internal teams.
If you possess a combination of some of the following skills, then LETS TALK!
Hands-on experience with Azure cloud platforms.
Strong understanding of networking, virtualisation, and cloud security principles.
Operate, maintain, and enhance the Azure Virtual Desktop (AVD) environment.
Experience with monitoring and logging tools (e.g., Azure Monitor, CloudWatch, Prometheus).
Expert in setting up and managing host pools, session hosts, user access, application layers, and FSLogix profiles.
Strong knowledge of cloud architecture, design, and implementation principles and practices.
Proficiency in scripting and automation tools, such as PowerShell, Power Automate, Azure CLI, Azure DevOps, and Azure Monitor.
Experience working with Azure Foundry, Microsoft Copilot Studio and AI Agents.
What you’ll get in return
In return, you will be rewarded with ongoing career development and training in an enviable team environment.
What you need to do now
If you’re interested in this role, click ‘apply now’ to forward an up-to-date copy of your CV, or call us now.
If this job isn’t quite right for you, but you are looking for a new position, please contact us for a confidential discussion about your career.
Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C’s, Privacy Policy and Disclaimers which can be found at (url removed)
Page 1 of 2

Frequently asked questions

What types of Prometheus jobs are available on this platform?
Our job board features a variety of Prometheus roles including Monitoring Engineer, DevOps Engineer, Site Reliability Engineer (SRE), and Cloud Infrastructure Specialist positions that require Prometheus expertise.
What skills are commonly required for Prometheus job listings?
Commonly required skills include proficiency with Prometheus for metrics collection and monitoring, experience with Grafana for dashboards, knowledge of alerting rules, familiarity with Kubernetes and cloud platforms, and strong Linux system administration skills.
Can I filter Prometheus job listings by experience level?
Yes, you can filter job listings by experience level such as entry-level, mid-level, and senior roles to find Prometheus positions that match your career stage.
Are remote Prometheus jobs available on your site?
Absolutely. Many companies post remote or hybrid Prometheus jobs on our platform. You can easily filter your search results to find remote opportunities.
How often are new Prometheus job listings posted?
New Prometheus job listings are added daily as companies continuously seek monitoring and DevOps professionals skilled with Prometheus.