PySpark Jobs

Make yourself visible and let companies apply to you.

Roles

PySpark Jobs

Overview

Looking for top PySpark jobs? Explore the latest PySpark developer positions on Haystack, your go-to IT job board for data engineering roles. Find exciting opportunities to work with big data, Apache Spark, and advanced analytics today!

Data Engineer - Birmingham

Data Engineer

Birmingham, UK | Hybrid

At Compass Group UK&I, we’re more than just the UK’s leading contract catering company - we’re driving digital transformation across the business. Our Digital & Technology team is at the heart of this journey, creating cutting-edge solutions that improve efficiency, elevate customer experiences, and deliver real business impact.

We’re looking for a Data Engineer to build and maintain data engineering solutions that power analytics, reporting, and decision-making across our organisation.

This is not a junior role. We’re looking for someone who can work with real autonomy - taking ownership of pipelines end-to-end, contributing to technical design, and supporting the engineers around them - while continuing to grow their craft on a modern, cloud-first platform.

You’ll work alongside senior engineers, business stakeholders, and analytics teams, making sure the data that flows through our business is accurate, timely, and built to last.

What You’ll Be Responsible For

Developing and deploying scalable data engineering solutions on the Databricks Lakehouse platform using PySpark, Spark SQL, and Python
Building batch pipelines that feed our Discovery Analytics platform, powered by Power BI, with accurate and reliable data
Contributing to CI/CD practices - automated testing, code reviews, and deployment pipelines that keep our engineering bar high
Supporting and mentoring junior engineers, sharing knowledge and helping the team grow
Collaborating with stakeholders to understand requirements and translate them into well-built, maintainable solutions
Monitoring and tuning pipelines to keep performance sharp and cloud costs in check

What We’re Looking For

A business-aware mindset - someone who understands how good data engineering creates real value, not just technical output
Solid hands-on experience with Databricks, Delta Lake, PySpark, and Spark SQL
Strong Python and SQL skills with a focus on clean, maintainable, production-ready code
Working knowledge of CI/CD practices including Git, automated testing, and deployment pipelines
Good understanding of data modelling, dimensional design, and data warehouse concepts
Experience with AWS and cloud-native data engineering patterns
Clear communicator who can work effectively with both technical colleagues and business stakeholders
Exposure to Apache Airflow, Databricks Workflows, or Informatica Cloud is a bonus - as are Databricks or AWS certifications

Why Join Us?

You’ll be part of the team building the data engineering foundations that underpin the UK’s largest catering business.

We’re mid-transformation - moving from legacy workflows to a cloud-first, Databricks-powered platform - and there’s real work to be done by engineers who care about doing it properly.

If you want to take on more ownership, work on problems that matter at scale, and grow alongside a team that’s genuinely raising the bar, we’d love to hear from you.

#REED

Data Engineer - Birmingham

Data Engineer

Birmingham, UK | Hybrid

We’re looking for a Data Engineer to build and maintain data engineering solutions that power analytics, reporting, and decision-making across our organisation.

You’ll work alongside senior engineers, business stakeholders, and analytics teams, making sure the data that flows through our business is accurate, timely, and built to last.

What You’ll Be Responsible For

Developing and deploying scalable data engineering solutions on the Databricks Lakehouse platform using PySpark, Spark SQL, and Python
Building batch pipelines that feed our Discovery Analytics platform, powered by Power BI, with accurate and reliable data
Contributing to CI/CD practices - automated testing, code reviews, and deployment pipelines that keep our engineering bar high
Supporting and mentoring junior engineers, sharing knowledge and helping the team grow
Collaborating with stakeholders to understand requirements and translate them into well-built, maintainable solutions
Monitoring and tuning pipelines to keep performance sharp and cloud costs in check

What We’re Looking For

A business-aware mindset - someone who understands how good data engineering creates real value, not just technical output
Solid hands-on experience with Databricks, Delta Lake, PySpark, and Spark SQL
Strong Python and SQL skills with a focus on clean, maintainable, production-ready code
Working knowledge of CI/CD practices including Git, automated testing, and deployment pipelines
Good understanding of data modelling, dimensional design, and data warehouse concepts
Experience with AWS and cloud-native data engineering patterns
Clear communicator who can work effectively with both technical colleagues and business stakeholders
Exposure to Apache Airflow, Databricks Workflows, or Informatica Cloud is a bonus - as are Databricks or AWS certifications

Why Join Us?

You’ll be part of the team building the data engineering foundations that underpin the UK’s largest catering business.

We’re mid-transformation - moving from legacy workflows to a cloud-first, Databricks-powered platform - and there’s real work to be done by engineers who care about doing it properly.

If you want to take on more ownership, work on problems that matter at scale, and grow alongside a team that’s genuinely raising the bar, we’d love to hear from you.

#REED

AWS Data Engineer

Real

Yorkshire And The Humber

I am supporting a university with a major data platform transformation project as they implement AWS across their environment.

We are looking for a Data Engineer with strong hands‑on experience in designing and delivering enterprise‑scale data pipelines using AWS Glue and PySpark. The role will involve building and optimising ETL processes, working with raw and curated datasets, and ensuring data is processed efficiently and to a high standard.

You will be responsible for developing scalable, production‑grade data workflows, integrating data from multiple systems, and applying best practices around data modelling, data quality, and automation. Experience working within a modern cloud data stack is essential, along with an understanding of how to structure data for analytics, reporting and downstream consumption.
The ideal candidate will have a solid background in Spark‑based engineering, particularly PySpark, and be confident working with Glue jobs, Glue Catalog, S3, and other AWS native services used within a data platform build.

Location: Remote (client based in North East England)
Rate: £500- £600 per day
IR35: Inside IR35, must use an approved umbrella on our list
Duration: approx 3 months
Start date: ASAP

Please click to find out more about our Key Information Documents. Please note that the documents provided contain generic information. If we are successful in finding you an assignment, you will receive a Key Information Document which will be specific to the vendor set-up you have chosen and your placement.

To find out more about Real, please visit

Real Staffing, a trading division of SThree Partnership LLP is acting as an Employment Business in relation to this vacancy | Registered office | 8 Bishopsgate, London, EC2N 4BQ, United Kingdom | Partnership Number | OC(phone number removed) England and Wales

Senior Technical Lead

Stackstudio Digital Ltd.

Job Title: Senior Technical Lead
Location: Norwich, Norfolk (3 Days a week)
Job Type: Contract (Inside IR35)
Duration: 6 Months The Role

We are looking for a Senior Technical Lead who combines hands-on engineering excellence with strong leadership and stakeholder management. You will own the end-to-end technical delivery of data platforms and pipelines built in AWS-with a focus on AWS Glue, Managed Workflows for Apache Airflow (MWAA), and Python-and collaborate closely with Directors, Senior Architects, and Program Leadership to deliver business outcomes at scale.

This is a player-coach role: you will design, build, review, and optimize complex data workflows while mentoring engineers and driving engineering best practices.

Ideal for: Someone who has delivered multiple production programs in a modern AWS data engineering landscape, can communicate trade-offs clearly to senior stakeholders, and can lead teams through ambiguity to predictable, high-quality outcomes.

Your Responsibilities:

Lead the design and implementation of scalable, secure, and cost-efficient ETL/ELT pipelines using AWS Glue, Python (PySpark), and MWAA (Airflow).
Define solution architectures, data models, orchestration patterns, and CI/CD for data workflows.
Own the technical roadmap, decomposition, and delivery plan-including sizing, sprint planning, and risk mitigation.
Drive performance optimization (e.g., partitioning strategies, Glue job tuning, job bookmarks, dynamic frames vs DataFrames, retry/backoff strategies in Airflow).
Ensure robust observability (logging, metrics, tracing) and data quality (unit tests, Great Expectations/Deequ-style checks, validations).
Act as the technical point of contact for Senior Architects, and Program Managers; translate business needs into technical designs and delivery milestones.
Present architecture decisions, trade-offs, and TCO to senior stakeholders with clarity, data, and rationale.
Manage vendor/partner coordination where relevant.
Establish coding standards, code review practices, branching strategies, and secure-by-design principles.
Implement DevSecOps for data: infrastructure-as-code (IaC), secrets management, environment promotion, and automated testing.
Ensure compliance with data governance, security, and regulatory requirements (e.g., PII/PCI, encryption, auditability, lineage).
Mentor and upskill engineers; foster a culture of learning, ownership, and continuous improvement.

Your Profile

Essential skills/knowledge/experience:

10+ years of total experience in software/data engineering, with 5+ years leading delivery of production solutions in an AWS data engineering environment.

Advanced hands-on expertise with:

Python (including PySpark & data engineering patterns)
AWS Glue (Jobs, Crawlers, Glue Studio, Glue Catalog, PySpark, Job bookmarks)
MWAA (Apache Airflow) (DAG design, scheduling, sensors, retries, XComs, task isolation, best practices)

Strong across broader AWS services:

S3, Lambda, Step Functions, IAM, CloudWatch, KMS, Secrets Manager, Athena, EMR (nice to have), Redshift (nice to have)
Proven experience delivering multiple end-to-end programs (architecture build test deploy operate) with measurable outcomes (SLAs, cost targets, performance).
Excellent stakeholder communication and executive presence; able to engage Directors, Senior Architects, and Program Leadership.
Solid grounding in data modeling, data governance, security/compliance, and cost optimization on AWS.
Experience with CI/CD (e.g., CodePipeline/GitHub Actions/Bitbucket Pipelines), IaC (CloudFormation/Terraform), and containerization (Docker).
Architectural thinking: designs for scale, reliability, cost, and evolvability.
Delivery excellence: breaks down complex work, sets milestones, manages risks, and delivers on time.
Communication & influence: distills complexity for senior stakeholders; backs decisions with data.
Hands-on leadership: sets the technical bar through reviews, pairing, and exemplars.
Ownership & clarity: aligns teams on problem statements, success criteria, and measurable outcomes.

Languages:

Python (PySpark), SQL

AWS:

Glue, MWAA (Airflow), S3, IAM, KMS, CloudWatch, Lambda, Step Functions, Athena, Redshift (nice), EMR (nice)

DevOps:

Git, CI/CD (CodePipeline/GitHub Actions), Terraform/CloudFormation, Docker

Data Quality/Observability:

Great Expectations/Deequ (nice), OpenLineage (nice)

Desirable skills/knowledge/experience:

Domain experience in BFSI (risk, pricing, regulatory reporting, underwriting, fraud, payments, or actuarial data).
Experience with event-driven and near-real-time pipelines (Kafka/Kinesis, streaming ETL).
Knowledge of data quality frameworks (Great Expectations, Deequ) and data lineage/catalog (Atlas, Alation, Collibra).
Exposure to Databricks or EMR for advanced Spark workloads.
Certifications: AWS Solutions Architect / Data Analytics / DevOps Engineer.
Prior experience leading multi-team programs with offshore/nearshore models.

We are Data Services, our mission is to unlock the value of data by delivering high-quality, reliable, and secure data services that are accessible, understandable, and actionable. We continuously evolve our offerings, leveraging modern cloud-based technologies, and fostering strong partnerships to help our colleagues in the Bank navigate the complexities of a data-driven world and achieve their strategic objectives.

Active SC Clearance

Job Description:

The world of data in Central Banking is evolving rapidly. With the rise of detailed data collection in financial regulation and the swift advancements in cloud-native data technologies, the demand for visionary data engineers is growing. We’re seeking a senior Data Engineer to join our Data Engineering team and play a pivotal role in shaping the Bank’s strategic cloud-first data platform.

As a senior member of the team, you will play a key role in designing and delivering robust, scalable data solutions that support the Bank’s core responsibilities around monetary policy, financial stability, and regulatory supervision. You’ll contribute to technical design decisions, mentor engineers, and collaborate across teams to ensure our data infrastructure continues to evolve and meet future demands.

Role Responsibilities

* Lead the design, development, and deployment of scalable, secure, and cost-effective distributed data solutions using Azure services (e.g., Azure Databricks, Azure Data Lake Storage, Azure Data Factory).

* Architect and implement advanced data pipelines using Databricks, Delta Lake, Python and Spark, ensuring performance, reliability, and maintainability across cloud and on-prem environments.

* Champion data quality, governance, and observability, ensuring data is accurate, timely, and fit-for-purpose for analytics, BI, and operational use cases.

* Drive the modernization of legacy systems, leading the migration of data infrastructure to Azure with minimal disruption and long-term scalability.

* Act as a technical authority on Azure-native data engineering, guiding best practices and setting standards across the team.

* Mentor and coach junior and mid-level engineers, fostering a culture of continuous learning, innovation, and technical excellence.

* Collaborate with architects, analysts, and stake holders to align data engineering efforts with strategic business goals and enterprise data strategy.

* Evaluate and introduce emerging technologies, tools, and methodologies to enhance the Bank’s data capabilities.

* Own the end-to-end delivery of complex data solutions, from requirements gathering to production deployment and support.

* Contribute to the development of reusable frameworks, templates, and patterns to accelerate delivery and ensure consistency across projects.

Minimum Criteria

* Extensive experience with Azure services including Azure Databricks, Azure Data Lake Storage, and Azure Data Factory.

* Advanced proficiency in SQL, Python, and Spark (PySpark), with a strong focus on performance optimization and distributed processing.

* Proven experience in CI/CD practices using industry-standard tools (e.g., GitHub Actions, Azure DevOps).

* Strong understanding of data architecture principles and cloud-native design patterns.

Essential Criteria

* Demonstrated ability to lead technical delivery, mentor engineering teams and collaborate with stakeholders to ensure alignment between data solutions and business strategy.

* Proficiency in Linux/Unix environments and shell scripting.

* Deep understanding of source control, testing strategies, and agile development practices.

* Self-motivated with a strategic mindset and a passion for driving innovation in data engineering.

Desirable Criteria

* Experience delivering data pipelines on Hortonworks/Cloudera on-prem and leading cloud migration initiatives.

* Familiarity with: Apache Airflow

* Data modelling and metadata management

* Experience influencing enterprise data strategy and contributing to architectural governance

Remote (UK) 1 2 days per month in the South Manchester area.
Up to £46,000 + flexibility for standout candidates

If you re a Data Engineer who wants more than just maintaining pipelines, this is a chance to shape something from the ground up.

This role sits at the heart of a business investing heavily in its data capability. They re moving away from reactive reporting towards a modern, insight-led platform and they need someone who wants to be part of that journey, not just observe it.

The Opportunity

You ll join at a pivotal moment.

The current data platform is being rebuilt, transitioning from legacy and mixed approaches into a modern, cloud-first architecture built on Azure Synapse and a medallion (bronze silver gold) design.

This isn t a keep the lights on role. You ll be:

Designing and building scalable data pipelines from scratch
Shaping how data is structured, governed, and used across the business
Influencing technical direction and bringing in better ways of working
Helping the organisation move towards predictive and insight-driven decision making

You ll have genuine ownership and the space to challenge existing approaches and introduce best practice.

What You ll Be Doing

Building and optimising data pipelines using Azure Synapse (pipelines and notebooks)
Writing and maintaining robust SQL, including complex stored procedures
Designing and implementing modern data warehouse architecture (medallion model)
Ensuring data quality, validation, and reliability across the platform
Integrating data from APIs and multiple internal sources
Collaborating with stakeholders to turn data into something genuinely useful

What We re Looking For

You don t need to tick every box, but there are a few things that really matter:

Must-have:

Strong SQL skills, including writing advanced stored procedures
Hands-on experience with Azure Synapse (pipelines and notebooks)
Experience building or working within modern data warehouse architectures
Ability to get up and running quickly without heavy onboarding

Nice to have:

Python, PySpark or Spark experience
Exposure to APIs and external data integration
Experience working in evolving or transforming data environments

Most importantly, you ll be someone who:

Wants to improve and evolve things, not just maintain them
Is comfortable challenging existing approaches constructively
Enjoys solving problems and taking ownership

Why This Role?

There are plenty of Data Engineer roles out there. This one stands out because of the impact and trajectory.

Build, not maintain genuine opportunity to shape a modern data platform

Influence your ideas and approach will matter from day one

Growth strong focus on learning, with access to training and new technologies

Variety a mix of engineering, problem-solving, and collaboration

Flexibility remote-first, with occasional in-person team time

You ll also be joining a team that values curiosity, improvement, and doing things properly rather than just quickly.

Location & Flexibility

Remote-first role across the UK
1 2 days per month in the Manchester area
Open to candidates across the North West, and beyond

A Note on Fit

This role won t suit someone looking for routine, maintenance-only work.

It will suit someone who wants to:

Build

Improve

Challenge

And leave things better than they found them

If you re looking for a role where you can genuinely influence a data platform and grow with it, this is well worth a conversation.

Tilt Recruitment are specialists in IT Recruitment. We work hard to find our candidates their perfect roles within fantastic organisations across the UK. If this role isn t right for you, please still get in touch with us as we may have other roles which may suit you better.

We also offer up to £500 for every successful referral, if you know someone who matches this skill set please let us know.

Tilt Recruitment is acting as an Employment Agency in relation to this vacancy

Senior Backend Engineer

Senior Backend Engineer (Platform & Cloud)
Location: London (Hybrid)
Contract: 6–12 months
Rate: Negotiable (DOE)

We’re working with a global enterprise organisation that is expanding a centralised engineering function supporting multiple product teams across an international portfolio.

They’re looking for a Senior Backend Engineer to join a core platform team responsible for building shared services, libraries, and cloud-based functionality used across a wide range of internal products.

This is a highly collaborative role, working directly with senior engineers and technical leaders across multiple time zones, helping shape the foundations of how teams build, scale, and reuse backend capability.

The Role

You’ll be part of a central engineering group building common functionality for distributed product teams. Early work will focus on:

Abstracting existing codebases into shared Python libraries
Developing data-driven solutions using PySpark and DataFrames
Building and extending Python-based microservices using Azure Functions
Creating scalable, reusable services that solve common challenges across teamsBeyond the initial phase, you’ll act as an extension of the product teams — designing and delivering new services, libraries, and architectural patterns to support platform growth.

Strong communication is essential, as you’ll be working closely with engineers and technical leads across five time zones.

Essential skills

Strong commercial experience with Python
PySpark and data-frame based processing
Solid SQL capability
Experience working with Azure infrastructure
Good understanding of containers, microservices, and functional design patterns
Comfortable working in Agile environments
Experience using Terraform for infrastructure as code
Strong approach to unit testing (ideally with PyTest)Nice to have

FastAPI
React / TypeScript
HTML / CSSWhy this role?
Work on a central platform team with genuine architectural influence
Build solutions that are used across multiple products and regions
Long-term contract with strong extension potential
Hybrid working in London
High-impact role in a complex, enterprise-scale environmentIf you’re a backend engineer who enjoys building platforms, shared services, and scalable cloud solutions — this is an amazing opportunity for you!

Interested? Please apply now with your updated CV and reach out to Tom Johnson at Certain Advantage - Ref: 79927

Data Engineering Manager

Are you a hands on Data Engineering leader who loves building high impact data platforms while mentoring and growing teams?

We’re hiring a Data Engineering Manager to join a fast moving, product focused environment where collaboration is high, decisions are quick, and your work will directly shape real products used by the business.

The Role

This is a true player coach position, roughly 50% hands on engineering and 50% team leadership.

You’ll lead a small but growing team of Data Engineers (currently 5, mostly junior), helping them mature technically while remaining deeply involved in building and optimising modern data pipelines on Databricks.

The team works in a highly collaborative office environment, enabling rapid delivery and close cross functional teamwork.

What You’ll Be Doing

Leading and mentoring a team of Data Engineers
Designing and building scalable data pipelines in Databricks
Remaining hands on with Python/PySpark development
Working closely with product, Front End, and Back End teams
Integrating multiple data sources, including APIs
Driving best practice across the data platform
Helping shape the future data architecture

What We’re Looking For

Essential:

Strong commercial experience with Databricks
Advanced Python and PySpark
Solid SQL skills
Experience building production data pipelines
Experience working with API based data ingestion
Familiarity with Azure storage connectivity
Prior technical leadership or mentoring experience

Nice to have:

Experience with Dynamics integrations
Background in product or startup environments
Broader Azure ecosystem exposure

Working Pattern

Predominantly onsite role
Expected 5 days/week initially to embed with the team
Increased flexibility likely after initial onboarding
Fast paced, highly collaborative office culture

Why Join?

High impact role in a growing data function
Strong investment in people and development
Free onsite restaurant (breakfast, lunch & snacks)
Fully equipped onsite gym
Collaborative, delivery focused culture
Opportunity to shape and scale a modern Databricks platform

If you are looking for your next exciting opportunity please apply today and I will be in touch!

Data Engineer

Opus Recruitment Solutions

Data Engineer | Outside IR35 | £450 - £500 | 6 months | Hybrid London We’re supporting a company who are looking for a Data Engineer to build and enhance the data processing capabilities within their Databricks environment. You’ll be responsible for developing the code that drives their data pipelines, using Python, Spark, and Databricks Workflows to deliver new platform functionality and ensure efficient execution. Key Responsibilities Develop reliable Python and PySpark code to support data ingestion, transformation, and end‑to‑end processing. Deliver new technical features and components aligned to approved solution designs and business requirements. Enhance, extend, and tune existing data frameworks to support additional use cases and improved performance. Create, manage, and optimise Databricks Workflows, including orchestration logic and operational behaviours. Carry out testing, performance tuning, and provide day‑to‑day operational support for data pipelines. Work closely with Solution Designers / Architects and Configuration Analysts to ensure consistent and effective delivery. If this is a role that suits your skillset, can work onsite 2 days per month and immediately available then please apply for the job advert directly or reach out to myself at (url removed). Data Engineer | Outside IR35 | £450 - £500 | 6 months | Hybrid London

Azure Data Engineer

Opus Recruitment Solutions

A large‑scale data transformation programme is underway, and our client is looking for an experienced Azure Data Engineer to support the rebuild of their cloud data platform. This role is hands‑on and delivery‑focused — you’ll be designing and developing Azure‑native data pipelines, working extensively with Databricks, and shaping scalable data models across the Microsoft ecosystem. The role would require you to be on site in Bristol 4 days per week, please only apply for this position if you are local enough to do this without relocating.

What you’ll be doing

Build, enhance and maintain data pipelines using Azure Databricks, Data Factory, and Delta Lake

Develop and optimise Lakehouse components and cloud‑based data flows

Create robust data models to support analytics, MI and downstream reporting

Assist in migrating legacy warehouse assets into a modern Azure environment

Contribute to cloud architecture decisions, data standards and best‑practice engineering patterns

Develop reliable Python and PySpark code to support data ingestion, transformation, and end‑to‑end processing.

What you’ll bring

Strong hands‑on experience across Azure Data Services (ADF, ADLS, Synapse, Databricks)

Excellent SQL skills, with experience in performance tuning and optimisation

Solid understanding of data modelling (star schema, medallion, ETL frameworks)

Ability to work with complex, inconsistent or legacy data sources

Experience building scalable, production‑ready pipelines in a cloud environment

Azure Data Engineer | £400 - £500 Outside IR35 | Bristol | Hybrid | 6‑Month Initial Term

Data Engineer

Randstad Technologies Recruitment

Data Engineer (II) | Snowflake & Microsoft Fabric Location: London Work Mode: Hybrid(1-2 days/week in Office) Duration: 6-Month Initial Contract Start Date: ASAP We are seeking a proactive Data Engineer (II) to build and scale high-performance data pipelines across Microsoft Fabric (OneLake) and Snowflake. This role is central to powering our BI, analytics, and AI/ML initiatives. The Role Build & Integrate: Develop complex ETL/ELT pipelines and implement OneLake interoperability between Fabric and Snowflake. Architect: Operate a Medallion architecture (Bronze/Silver/Gold) and design curated datasets optimized for Power BI. Secure: Implement rigorous data security, GDPR compliance, and client ring-fencing (Row-Level Security). Innovate: Create SQL and PySpark notebooks to prepare feature engineering datasets for AI/ML models. Collaborate: Work independently via Jira/GitHub, maintaining high CI/CD and documentation standards. What You Bring 5+ years of Data Engineering experience. Expertise: Deep knowledge of Snowflake and Microsoft Fabric. Tech Stack: Advanced SQL, Python, and PySpark (Notebooks). Modelling: Strong relational data modelling and cloud cost optimization skills. DevOps: Experience with GitHub, CI/CD, and workflow orchestration. Randstad Technologies is acting as an Employment Business in relation to this vacancy

Data / Machine Learning Ops Engineer

Location: Erskine, Scotland (Hybrid 2/3 days per week in the office)
Candidates must be eligible for clearance.

DXC Technology (NYSE: DXC) is a leading independent, end-to-end IT services company, helping organisations harness innovation to thrive through change. Serving nearly 6,000 private and public sector clients across 70 countries, DXC combines technology independence, global talent, and an extensive partner network to deliver next-generation IT services and solutions.

We are proud to be recognised globally for corporate responsibility and inclusive workplace practices.

The Role

Are you passionate about bringing machine learning solutions into real-world production environments? Do you enjoy collaborating with others to build scalable, reliable systems?

We are looking for a Machine Learning Ops Engineer to join our growing team. This role is ideal for someone who enjoys solving complex problems, working cross-functionally, and continuously developing their technical expertise in a supportive environment.

If you dont meet every single requirement listed below, we still encourage you to apply. We value potential, curiosity, and a willingness to learn.

What Youll Be Doing

Deploying, monitoring, and scaling machine learning models in production.
Collaborating with data scientists, engineers, and stakeholders to integrate AI solutions into scalable products.
Supporting the full ML lifecycle, from experimentation to deployment and optimisation.
Applying best practices in data engineering and contributing to architectural decisions.
Using modern MLOps tools and CI/CD approaches to improve reliability and efficiency.
Contributing to a culture of knowledge-sharing and continuous improvement.

Technical Experience

Were looking for experience in many of the following areas:

Strong Python skills and familiarity with ML libraries such as Pandas, NumPy, and scikit-learn.
Experience with frameworks such as TensorFlow, Keras, or PyTorch.
Exposure to gradient boosting tools such as XGBoost, LightGBM, or CatBoost.
Experience with model deployment tools (e.g., ONNX, TensorRT, TensorFlow Serving, TorchServe).
Familiarity with ML lifecycle tools such as MLflow, Kubeflow, or Azure ML Pipelines.
Experience working with distributed data processing (e.g., PySpark) and SQL.
Understanding of software engineering best practices, including version control (Git).
Knowledge of CI/CD principles in ML environments.
Experience with cloud-native ML platforms is advantageous.

What Were Looking For

A collaborative mindset and strong communication skills.
A thoughtful, structured approach to problem solving.
A commitment to continuous learning and professional growth.
The confidence to contribute ideas while valuing diverse perspectives.

Why Join Us?

Work on meaningful AI projects with real-world impact.
Join a supportive, forward-thinking team that values inclusion and diverse perspectives.
Access structured learning, mentoring, and career development opportunities.
Flexible hybrid working arrangements.
A workplace culture that supports wellbeing and work-life balance.

What We Offer

Competitive salary.
Pension scheme.
DXC Select comprehensive benefits package including private medical insurance, gym membership, and more.
Perks at Work discounts on technology, groceries, travel and more.
DXC incentives recognition tools, employee lunches, and regular social events.

Ready to Shape the Future of AI?

We are committed to building diverse teams and creating an inclusive environment where everyone can thrive. If this role excites you, wed love to hear from you.

Apply today and bring your skills, perspective, and ambition to a team that values innovation, collaboration, and growth.

Palantir Foundry Data Engineer X2

Role: Palantir Foundry Data Engineer Location: Remote working Day rate: £400pd-£500pd Contract: 3 month initial We are currently recruiting for an experienced Data Engineer with strong, hands-on expertise in Palantir Foundry to design, build, and optimize scalable data pipelines, semantic models, and data products. In this role, you will work closely with data scientists, analysts, product teams, and business stakeholders to deliver robust, production-grade data foundations that support analytics, automation, and operational decision-making. You will play a key part in shaping the data ecosystem, ensuring reliability, performance, and long-term sustainability. Skills and experience required Strong experience in data engineering Proven, hands-on experience working with Palantir Foundry in a production environment. Strong proficiency in Python, SQL, PySpark and Spark SQL Experience delivering production pipelines in AWS, Azure, or GCP environments. Solid understanding of data modeling, schema design, metadata management, and governance. Familiarity with CI/CD, Git-based workflows, and software engineering best practices. This will be a remote working opportunity, which may require occasional travel to client site, please consider this when applying for the role. If you are interested in the role and would like to apply, please click on the link for immediate consideration

Lead Platform Engineer

Lead Data Platform Engineer - Databricks - IAC - Terraform - Azure Data Factory - Data Lakehouse

The Data Platform Engineer designs, develops, automates, and maintains secure, scalable, and compliant data platforms that enable the firm to efficiently manage, analyse, and utilise data. The role ensures that data solutions are robust and reliable while meeting regulatory obligations and safeguarding client confidentiality.

Key Responsibilities

Design and architect scalable, secure, and compliant data platforms and solutions, producing technical documentation and securing approvals through governance bodies such as Architecture Review Boards.
Build and deliver robust data solutions using Databricks, PySpark, Spark SQL, Azure Data Factory, and Azure services.
Develop APIs and write efficient Python, PySpark, and SQL code to support data integration, processing, and automation.
Implement and manage CI/CD pipelines and automated deployments using Azure DevOps to enable reliable releases across environments.
Develop and maintain infrastructure-as-code (eg, Terraform, ARM) to provision and manage cloud resources, including ADF pipelines, Databricks assets, and Unity Catalog components.
Monitor, troubleshoot, and optimise data platform performance, reliability, and costs, identifying bottlenecks and recommending improvements.
Create dashboards and observability tools to report on platform performance, usage, incidents, and operational KPIs.

Knowledge, Skills & Experience

Degree in Computer Science, Data Engineering, or a related field.
Proven experience designing and building cloud-based data platforms, ideally within Azure.
Strong hands-on expertise with Databricks, PySpark, Spark SQL, and Azure Data Factory.
Solid understanding of Data Lakehouse architecture and modern data platform design.
Proficiency in Python for data engineering, automation, and data processing.
Experience developing and integrating REST APIs for data services.
Strong DevOps experience, including CI/CD, automated testing, and release management for data platforms.
Experience with Infrastructure as Code tools such as Terraform or ARM templates.
Knowledge of data modelling, ETL/ELT pipelines, and data warehousing concepts.
Familiarity with monitoring, logging, and alerting tools (eg, Azure Monitor).

Desirable

Experience with additional Azure services (eg, Fabric, Azure Functions, Logic Apps).
Knowledge of cloud cost optimisation for data platforms.
Understanding of data governance and regulatory compliance (eg, GDPR).
Experience working in regulated or professional services environments.

Senior Data Engineer - Microsoft Fabric

Roc Search Europe Limited

We’re looking for an experienced Senior Data Engineer to join a growing team building a modern Microsoft Fabric data platform. This is a hands-on role designing and delivering scalable data pipelines, Lakehouse solutions, and analytics models within the Azure ecosystem.

What You’ll Do:

Build and maintain ETL/ELT pipelines and data models in Fabric (Data Factory, Notebooks, Spark)
Write high-performance Spark SQL, T-SQL, Python/PySpark
Manage ingestion, transformation, and loading from multiple sources
Translate stakeholder requirements into scalable technical solutions
Mentor team members and establish engineering standards, security, and governance
Leverage AI-assisted development tools like GitHub Copilot, ChatGPT, and Fabric Copilot

Essential Experience:

Microsoft Fabric & Azure Data ecosystem
Lakehouse architectures & Data Factory
Python, PySpark, Spark SQL
Proven hands-on delivery in this stack

What’s on Offer:

Salary: 70,000
Excellent benefits & annual leave package
Strong progression & development opportunities
Opportunity to work on a modern, AI-enabled data platform
Real ownership and influence in a growing, forward-thinking data team

If you’re an experienced Data Engineer with solidMicrosoft Fabric and Azure experience, we’d love to hear from you!

Lead PySpark Engineer

Randstad Technologies Recruitment

PySpark Engineer Lead

As the Technical Lead, you will drive the high-stakes migration of legacy SAS analytics to a modern, cloud-native PySpark ecosystem on AWS. This isn’t just a lift and shift you will refactor complex procedural logic into scalable, production-ready distributed pipelines for a Tier-1 financial services environment.

Core Responsibilities

Engineering Leadership: Design and develop complex ETL/ELT pipelines and Data Marts using PySpark, EMR, and Glue.
Legacy Modernisation: Architect the conversion of SAS Base/Macros into modular, testable Python code using SAS2PY and manual refactoring.
Performance Tuning: Optimise Spark execution (partitioning, shuffling, caching) to ensure cost-efficient processing of massive financial datasets.
Quality & Governance: Implement rigorous CI/CD, unit testing, and data reconciliation frameworks to ensure “penny-perfect” accuracy.

Technical Stack

Engine: PySpark (Expert), Python (Clean Code/SOLID principles).
AWS: EMR, Glue, S3, Athena, IAM, Lambda.
Data Modeling: SCD Type 2, Fact/Dimension tables, Data Vault/Star Schema.
Legacy: Proficiency in reading/debugging SAS (Base, Macros, DI Studio).
DevOps: Git-based workflows, Jenkins/GitLab CI, Terraform.

Randstad Technologies is acting as an Employment Business in relation to this vacancy.

Active SC Clearance

Job Description:

The world of data in Central Banking is evolving rapidly. With the rise of detailed data collection in financial regulation and the swift advancements in cloud-native data technologies, the demand for visionary data engineers is growing. Were seeking a senior Data Engineer to join our Data Engineering team and play a pivotal role in shaping the Banks strategic cloud-first data platform.

As a senior member of the team, you will play a key role in designing and delivering robust, scalable data solutions that support the Banks core responsibilities around monetary policy, financial stability, and regulatory supervision. Youll contribute to technical design decisions, mentor engineers, and collaborate across teams to ensure our data infrastructure continues to evolve and meet future demands.

Role Responsibilities

Lead the design, development, and deployment of scalable, secure, and cost-effective distributed data solutions using Azure services (e.g., Azure Databricks, Azure Data Lake Storage, Azure Data Factory).
Architect and implement advanced data pipelines using Databricks, Delta Lake, Python and Spark, ensuring performance, reliability, and maintainability across cloud and on-prem environments.
Champion data quality, governance, and observability, ensuring data is accurate, timely, and fit-for-purpose for analytics, BI, and operational use cases.
Drive the modernization of legacy systems, leading the migration of data infrastructure to Azure with minimal disruption and long-term scalability.
Act as a technical authority on Azure-native data engineering, guiding best practices and setting standards across the team.
Mentor and coach junior and mid-level engineers, fostering a culture of continuous learning, innovation, and technical excellence.
Collaborate with architects, analysts, and stake holders to align data engineering efforts with strategic business goals and enterprise data strategy.
Evaluate and introduce emerging technologies, tools, and methodologies to enhance the Banks data capabilities.
Own the end-to-end delivery of complex data solutions, from requirements gathering to production deployment and support.
Contribute to the development of reusable frameworks, templates, and patterns to accelerate delivery and ensure consistency across projects.

Minimum Criteria

Extensive experience with Azure services including Azure Databricks, Azure Data Lake Storage, and Azure Data Factory.
Advanced proficiency in SQL, Python, and Spark (PySpark), with a strong focus on performance optimization and distributed processing.
Proven experience in CI/CD practices using industry-standard tools (e.g., GitHub Actions, Azure DevOps).
Strong understanding of data architecture principles and cloud-native design patterns.

Essential Criteria

Demonstrated ability to lead technical delivery, mentor engineering teams and collaborate with stakeholders to ensure alignment between data solutions and business strategy.
Proficiency in Linux/Unix environments and shell scripting.
Deep understanding of source control, testing strategies, and agile development practices.
Self-motivated with a strategic mindset and a passion for driving innovation in data engineering.

Desirable Criteria

Experience delivering data pipelines on Hortonworks/Cloudera on-prem and leading cloud migration initiatives.
Familiarity with: Apache Airflow
Data modelling and metadata management
Experience influencing enterprise data strategy and contributing to architectural governance.

Changed this now. I was confusing this with PDE role as I am working on that in parallel. Hope this makes sense now.

data solutions rather than architectures?

Should add Python here as a key tech we use

Have mentioned Python in ‘Minimum Criteria’ section below, but will add here too

this could be added to Essential Criteria ?

stakeholder and project management ?

Have updated #1 in essential criteria below. But I have now used the previous version to create requisition in OBS. Will see if it can be changed.

What is the difference between “minimum” and “essential” criteria. Both imply that they are mandatory and so could be one list?

This is a bit confusing. I used to have just one, but this is the standard format of JD that the Bank wants us to follow. Here is the difference:

Min Criteria:

This must list the minimum technical skills/experience/qualifications required to do the job and should be measurable/scoreable. The screening questions you select must link to these, in order to allow candidates to best demonstrate their suitability for the role.

Essential:

This lists other important technical skills/experience/qualifications, and also more behavioural competencies. These are ones that are better assessed at interview rather than on screening questions on the application form

Ok, I think we could go back and ask HR about this as it does seem confusing and to me doesn’t give a good impression of the Bank to applicants at it looks like 2 lists for the same thing.

I had checked this earlier, but seems they want us to follow this format. When I advertised last time, I just mentioned Minimum Criteria, but they said it has to be split into Minimum and Essential.

Don’t think we need to mention Atlas or Cloudera Manager as we hardly ever use those. Airflow could be useful so would leave that in.

Machine Learning Engineer

Location: Erskine, Scotland - Hybrid Candidates must be eligible for clearance Were looking for a talented and motivated Machine Learning Engineer to join our growing team. This is an exciting opportunity for women who want to advance their career in AI/ML, work on meaningful projects, and thrive in a supportive, collaborative environment. What Youll Do: Design, develop, and deploy machine learning models using modern frameworks and libraries. Collaborate closely with data scientists, engineers, and stakeholders to turn ideas into impactful solutions. Optimize and deploy models with tools like TensorFlow Serving, TorchServe, ONNX, and TensorRT. Build and manage ML pipelines using MLflow, Kubeflow, and Azure ML Pipelines. Work with large-scale data using PySpark and integrate ML solutions into production environments. Monitor and improve model performance to ensure accuracy and efficiency. Contribute to team knowledge by mentoring and supporting colleagues. Bring creativity and fresh perspectives to problem-solving and technical solutions. What Were Looking For: We welcome applications from women who are passionate about machine learning and eager to grow: Strong Python skills and experience with ML libraries (pandas, NumPy, scikit-learn, XGBoost, LightGBM, CatBoost, TensorFlow, Keras, PyTorch). Familiarity with model deployment and serving tools (ONNX, TensorRT, TensorFlow Serving, TorchServe). Experience with ML lifecycle tools (MLflow, Kubeflow, Azure ML Pipelines). Knowledge of distributed data processing (PySpark) and software engineering principles (Git). A collaborative mindset and excellent problem-solving abilities. Experience in data cleansing, exploratory data analysis, and visualisation. A continuous learning mindset and interest in emerging AI/ML technologies. Why Join Us? Work on impactful AI projects with real-world applications. Be part of a collaborative and forward-thinking team. Access to continuous learning and development opportunities. Flexible working arrangements and a supportive work culture. TPBN1\_UKTJ

Senior Data Engineer - (Python & SQL)

Senior Data Engineer (Python & SQL)
Location London with hybrid working Monday to Wednesday in the office
Salary 70,000 to 85,000 depending on experience
Reference J13026

An AI first SaaS business that transforms high quality first party data into trusted, decision ready insight at scale is looking for a Senior Data Engineer to join its growing data and engineering team.

This role sits at the core of data engineering. You will work with data that is often imperfect and transform it into well structured, reliable datasets that other teams can depend on. The focus is on engineering high quality data foundations rather than analytics or cloud infrastructure alone.

You will design and build clear, maintainable data pipelines using Python and SQL within a modern data and AI platform, with a strong focus on data quality, robustness, and long term reliability.

You will also play an important mentoring role within the team, supporting and guiding other data engineers and helping to raise engineering standards through thoughtful, hands on leadership.

Why join
A supportive and inclusive environment where different perspectives are welcomed and people are encouraged to contribute and be heard
Clear progression with space to deepen your technical expertise and grow your confidence at a sustainable pace
A team that values collaboration, good communication, and shared ownership over hero culture
The opportunity to work on meaningful data engineering problems where quality genuinely matters

What you will be doing
Designing and building cloud based data and machine learning pipelines that prepare data for analytics, AI, and product use
Writing clear, well-structured Python, PySpark, and SQL to transform and validate data from multiple upstream sources
Taking ownership of data quality, consistency, and reliability across the pipeline lifecycle
Shaping scalable data models that support a wide range of downstream use cases
Working closely with Product, Engineering, and Data Science teams to understand data needs and constraints
Mentoring and supporting other data engineers, sharing knowledge and encouraging good engineering practices
Contributing to the long term health of the data platform through thoughtful design and continuous improvement

What we are looking for
Strong experience using Python and SQL to transform large, real world datasets in production environments
A deep understanding of data structures, data quality challenges, and how to design reliable transformation logic
Experience working with modern data platforms such as Azure, GCP, AWS, Databricks, Snowflake, or similar
Confidence working with imperfect data and making it fit for consumption downstream
Experience supporting or mentoring other engineers through code reviews, pairing, or informal guidance
Clear, thoughtful communication and a collaborative mindset

You do not need to meet every requirement listed. What matters most is strong, hands on experience using Python and SQL to work confidently with complex, real world data, apply sound engineering judgement, and help others grow through your experience.

Right to work in the UK is required. Sponsorship is not available now or in the future.

Apply to find out more about the role.

If you have a friend or colleague who may be interested, referrals are welcome. For each successful placement, you will be eligible for our general gift or voucher scheme.
Datatech is one of the UK’s leading recruitment agencies specialising in analytics and is the host of the critically acclaimed Women in Data event. For more information, visit (url removed)

Principal Pricing Analyst

Job Title: Principal Pricing Analyst

Locations: This can be a largely remote position with the occasion travel to the office closest to you. We have offices based in Manchester, Stoke, London and Peterborough.

Role Overview

Markerstudy Group are looking for a Principal Pricing Analyst to join a quickly growing and developing pricing department across a range of insurance lines.

You will utilise your technical expertise, in-depth knowledge of insurance industry and market leading tools to produce creative and actionable pricing solutions. This role requires a large element of coaching team members and championing best practice across the department.

Reporting to the our Associate Director, you will make use of WTW Radar and Emblem and you will have responsibility for the development and maintenance of predictive models (GLM) and price optimisation including machine learning algorithms (GBM), LTV (Lifetime Value) and fair pricing principles. Ultimately creating value for our customers.

Bringing best in class pricing experience, you’ll be expected to provide pricing proposals considering customer and commercial outcomes, communicating these in a compelling, impactful way to all levels of stakeholders to help us make the right decisions at the right times.

You’ll work on multiple priorities within a fast paced, dynamic environment. You’ll need to be able to manage the expectations of stakeholders alongside prioritising your workload.

As a Principal Pricing Analyst, you will use your advanced analytical skills to:

Be a key stakeholder influencing the direction & outcome of projects
Provide technical leadership on WTW toolkit (in particular Radar Optimiser) to drive forward effective and efficient solutions
Provide thought leadership on optimisation and modelling concepts
Research, develop and champion the use of best practice methods and standards and ensure they are embedded throughout the department
Lead the development of the Groups pricing capability
Query large databases to extract and manipulate data that is fit for purpose
Oversee and assist in the development and implementation of the market leading methodologies you’ve identified
Continuously evaluate methodologies, understanding how they fit into the wider piece, and identify where they can be improved

Key Skills and Experience:

Previous experience within general insurance pricing
Experience with some of the following predictive modelling techniques; Logistic Regression, GBMs, Elastic Net GLMs, GAMs, Decision Trees, Random Forests, Neural Nets and Clustering
Experience in statistical and data science programming languages (e.g. R, Python, PySpark, SAS, SQL)
A quantitative degree (Mathematics, Statistics, Engineering, Physics, Computer Science, Actuarial Science)
Experience of WTW’s Radar software
Proficient at communicating results in a concise manner both verbally and written

About us

Markerstudy Group is a major force in UK general insurance market, combining scale with innovation. Markerstudy Group have deep product and distribution reach through multiple brands and an experienced leadership foundation coordinating diverse and fast-evolving business units. The Group employs more than 6,000 people across the UK.

Azure Cosmos DB Developer

Stackstudio Digital Ltd.

Role / Job Title:Azure Cosmos DB DeveloperWork Location:LondonMode of Working:Hybrid

Office Requirement: 1 day a week mandatory in office

The RoleThe role will be integral to realising the customer’s vision and strategy in transforming some of their critical application and data engineering components. As a global financial markets infrastructure and data provider, the customer keeps abreast of the latest cutting technologies enabling their core services and business requirements. The role is critical in this endeavour by the means of enabling the technical and quality assurance thought leadership and excellence required for the purpose.Your Responsibilities

Develop cloud-native applications using Azure Cosmos DB (SQL API, Mongo API).
Build reusable libraries and frameworks in C#/.NET or Node.js
Establish robust CI/CD pipelines through GitLAB DevOps to streamline delivery and reduce operational overhead.
Design efficient data models tailored to NoSQL principles and application requirements.
Write and tune queries to minimize RU consumption and improve response times.
Implement indexing strategies and partitioning schemes for optimal performance.
Develop efficient queries using Cosmos DB SQL syntax.
Implement automated testing and unit test frameworks.
Collaborate with solution architects and DevOps teams to integrate Cosmos DB into microservices
Ensure compliance with security, governance, and data protection standards

Your ProfileEssential Skills / Knowledge / Experience

Hands-on experience with Azure Cosmos DB including query optimisation and throughput management
Thorough understanding of Cosmos DB Change Data Feed (CDF) and integration and debugging of Spark with Cosmos CDF.
High technical proficiency with Cosmos DB SDKs (e.g., .NET, Java, Python, Node.js)
Thorough understanding of Spark distributed computing concepts.
High technical proficiency in PySpark and Spark Concepts.
Experience with concurrency patterns, CLR, and scalable application design.
Deep understanding of Azure services including Azure Functions, App Services, AKS, and Logic Apps.
Experience with SQL API and familiarity with other APIs.
Strong understanding of partitioning, indexing, and consistency levels.
Experience with Git, version control, and continuous integration tools.
Strong goal-oriented outlook, problem solving capabilities, written and verbal communication skills, and collaborative mindset to work with internal and external stakeholders.

Page 1 of 2

Frequently asked questions

Haystack features a wide range of PySpark job listings including roles such as Data Engineer, Big Data Developer, Data Scientist, and Analytics Engineer working with Apache Spark and PySpark in various industries.

While specific requirements vary by employer, most PySpark job listings require a strong understanding of Apache Spark fundamentals, including working with RDDs, DataFrames, and Spark SQL through PySpark.

Yes, Haystack includes many remote and hybrid PySpark job listings, allowing you to work from anywhere while leveraging your PySpark skills.

To increase your chances, make sure your resume highlights relevant experience with PySpark and big data technologies, tailor your application to each job description, and consider obtaining certifications in Apache Spark or related technologies.

Yes, Haystack lists entry-level PySpark roles for candidates new to the field or transitioning from other data technologies. These listings typically require foundational knowledge of Python and Spark along with eagerness to learn.