PySpark Jobs

Make yourself visible and let companies apply to you.

Roles

PySpark Jobs

Overview

Looking for top PySpark jobs? Explore the latest PySpark developer positions on Haystack, your go-to IT job board for data engineering roles. Find exciting opportunities to work with big data, Apache Spark, and advanced analytics today!

Lead Data Engineer (Azure / Databricks)

NO VISA REQUIREMENTS

MUST BE BASED NEAR GLASGOW TO WORK 3 DAYS ONSITE

My FMCG client is undergoing a major transformation of their entire data landscape-migrating from legacy systems and manual reporting into a modern Azure + Databricks Lakehouse. They are building a secure, automated, enterprise-grade platform powered by Lakeflow Declarative Pipelines, Unity Catalog and Azure Data Factory.
They are looking for a Lead Data Engineer to help deliver high-quality pipelines and curated datasets used across Finance, Operations, Sales, Customer Care and Logistics.

What You’ll Do

Lakehouse Engineering (Azure + Databricks)

Build and maintain scalable ELT pipelines using Lakeflow Declarative Pipelines, PySpark and Spark SQL.
Work within a Medallion architecture (Bronze ? Silver ? Gold) to deliver reliable, high-quality datasets.
Ingest data from multiple sources including ChargeBee, legacy operational files, SharePoint, SFTP, SQL, REST and GraphQL APIs using Azure Data Factory and metadata-driven patterns.
Apply data quality and validation rules using Lakeflow Declarative Pipelines expectations.

Curated Layers & Data Modelling

Develop clean and conforming Silver & Gold layers aligned to enterprise subject areas.
Contribute to dimensional modelling (star schemas), harmonisation logic, SCDs and business marts powering Power BI datasets.
Apply governance, lineage and permissioning through Unity Catalog.

Orchestration & Observability

Use Lakeflow Workflows and ADF to orchestrate and optimise ingestion, transformation and scheduled jobs.
Help implement monitoring, alerting, SLAs/SLIs and runbooks to support production reliability.
Assist in performance tuning and cost optimisation.

DevOps & Platform Engineering

Contribute to CI/CD pipelines in Azure DevOps to automate deployment of notebooks, Lakeflow Declarative Pipelines, SQL models and ADF assets.
Support secure deployment patterns using private endpoints, managed identities and Key Vault.
Participate in code reviews and help improve engineering practices.

Collaboration & Delivery

Work with BI and Analytics teams to deliver curated datasets that power dashboards across the business.
Contribute to architectural discussions and the ongoing data platform roadmap.

Tech You’ll Use

Databricks: Lakeflow Declarative Pipelines, Lakeflow Workflows, Unity Catalog, Delta Lake
Azure: ADLS Gen2, Data Factory, Event Hubs (optional), Key Vault, private endpoints
Languages: PySpark, Spark SQL, Python, Git
DevOps: Azure DevOps Repos & Pipelines, CI/CD
Analytics: Power BI, Fabric

What We’re Looking For

Experience

Commercial and proven Lead Data Engineering experience.
Hands-on experience delivering solutions on Azure + Databricks.
Strong PySpark and Spark SQL skills within distributed compute environments.
Experience working in a Lakehouse/Medallion architecture with Delta Lake.
Understanding of dimensional modelling (Kimball), including SCD Type 1/2.
Exposure to operational concepts such as monitoring, retries, idempotency and backfills.

Mindset

Good energy and enthusiasm
Keen to grow within a modern Azure Data Platform environment.
Comfortable with Git, CI/CD and modern engineering workflows.
Able to communicate technical concepts clearly to non-technical stakeholders.
Quality-driven, collaborative and proactive.

Why Join?

Opportunity to shape and build a modern enterprise Lakehouse platform.
Hands-on work with Azure, Databricks and leading-edge engineering practices.
Real progression opportunities within a growing data function.
Direct impact across multiple business domains.

We are seeking a hands-on Azure Cosmos DB Developer to join a high-profile cloud-native data transformation programme within financial services. This is a contract opportunity working on scalable, production-grade Azure microservices and distributed data platforms.

You will design and optimise Azure Cosmos DB (SQL API) solutions, focusing on query optimisation, RU consumption, throughput management, partitioning strategies and indexing performance. The role also involves integrating Cosmos DB Change Data Feed (CDF) with Spark / PySpark in distributed environments.

Key Responsibilities:

* Develop cloud-native applications using Azure Cosmos DB

* Implement partitioning, indexing and consistency models

* Optimise queries to reduce RU and improve latency

* Integrate Cosmos DB Change Feed with Spark

* Build services using C# / .NET Core or Node.js

* Deploy solutions via Azure Functions, AKS, App Services

* Contribute to CI/CD pipelines (GitLab / DevOps)

Required Skills:

* Strong hands-on experience with Azure Cosmos DB

* Experience with Spark or PySpark

* Knowledge of SQL API, Change Feed, throughput management

* Experience with C#, .NET, Node.js or Python

* Understanding of distributed systems and microservices architecture

* Azure platform experience (AKS, Functions, App Services)

This role suits a senior-level engineer who thrives in performance optimisation, scalable system design, and Azure cloud engineering within enterprise environments.

Apply now to discuss further

Data Engineer | Outside IR35 | £400 - £500 | 6 months | Hybrid Nottingham

Opus Recruitment Solutions

We’re looking for a highly skilled Data Engineer to join a growing data team supporting a large-scale modern data platform project. Working 3 days a week onsite on the outskirts of Nottingham, you'll be a key contributor in evolving the organisation’s data capability, focusing on best‑practice engineering, clean architecture, and high‑value BI delivery. Key Responsibilities Design, build and optimise ETL/ELT pipelines using Azure Data Factory and Databricks. Develop scalable data models and transformations using SQL and Python. Work hands‑on with Databricks (Lakehouse, Delta tables, notebooks, workflows). Deliver high-quality dashboards and reporting solutions using Power BI. Implement best practices for data quality, governance, lineage and automation. Collaborate with cross‑functional teams including analysts, product owners and business stakeholders. Support performance tuning, cost optimisation and reliability improvements across the data estate. Document pipelines, models and processes to ensure smooth knowledge transfer.Technical Skills Required Databricks – notebooks, Delta Lake, Spark (PySpark desirable) Power BI – data modelling, DAX, dashboard/report development SQL – advanced querying, performance optimisation, data modelling Python – scripting, transformation logic, automation Azure Data Factory – pipelines, triggers, mapping data flows Understanding of data warehousing / lakehouse principles Experience working in cloud-based data ecosystems (Azure) Strong appreciation of data quality, governance and best practicesIf this is a role that suits your skillset, can work onsite 3 days per week and immediately available then please apply for the job advert directly or reach out to myself at (url removed)

Data Engineer Analytics, Assistant Vice President

This job is with State Street, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.

We are seeking an Analytics‑focused Data Engineer to design and build trusted, analytics‑ready datasets on a modern AWS and Databricks data platform. This role is critical to enabling business intelligence, reporting, and advanced analytics by transforming raw data into well‑modeled, high‑quality, and performant analytics layers.

You will work closely with analytics, BI, finance, and product teams to ensure data is easy to consume, well understood, and reliable for decision‑making at scale.

Key Responsibilities

Analytics‑Ready Data Modeling

Design and implement analytics‑optimized data models (fact/dimension, star/snowflake schemas).

Build and maintain curated analytics layers (gold tables) in Databricks using Delta Lake.

Translate business requirements into clear, reusable datasets for dashboards and reports.

Support semantic consistency across metrics, dimensions, and KPIs.

Data Pipelines & Transformations

Develop and maintain ETL/ELT pipelines using Databricks (Spark SQL, PySpark).

Transform raw and intermediate data into clean, documented, and performant analytics datasets.

Implement incremental processing, partitioning, and optimization techniques for BI workloads.

Ensure pipelines are resilient, observable, and production‑ready.

AWS Analytics Platform

Leverage AWS services such as S3, Glue, Redshift, Lambda, and IAM to support analytics use cases.

Integrate Databricks with AWS storage and security services.

Monitor pipeline execution, performance, and cost for analytics workloads.

Data Quality, Metrics & Trust

Implement data quality checks, reconciliation logic, and anomaly detection for analytics data.

Validate accuracy of business metrics used in executive dashboards and reports.

Support data lineage, documentation, and metric definitions.

Partner with stakeholders to ensure a single source of truth for analytics.

BI & Analytics Enablement

Support downstream tools such as Power BI, Tableau, or similar BI tools.

Optimize datasets for dashboard performance and concurrency.

Collaborate with analysts to improve query patterns and data usage.

Enable self‑service analytics through well‑designed datasets and documentation.

Required Qualifications

5-8+ years of experience in data engineering or analytics engineering roles.

Strong experience with Databricks for analytics workloads.

Advanced proficiency in SQL (complex transformations, window functions, performance tuning).

Solid experience with AWS‑based analytics architectures.

Strong experience with data modeling for analytics.

Proficiency in Python or PySpark.

Experience supporting BI, reporting, and analytics teams.

Nice to Have

Experience with Analytics Engineering / ELT patterns.

Familiarity with dbt or similar transformation frameworks.

Experience supporting finance or executive reporting.

Knowledge of data governance, metric catalogs, or data discovery tools.

Experience with streaming data for near‑real‑time analytics.

Exposure to regulated or enterprise analytics environments.

About State Street Across the globe, institutional investors rely on us to help them manage risk, respond to challenges, and drive performance and profitability. We keep our clients at the heart of everything we do, and smart, engaged employees are essential to our continued success.

We are committed to fostering an environment where every employee feels valued and empowered to reach their full potential. As an essential partner in our shared success, you’ll benefit from inclusive development opportunities, flexible work-life support, paid volunteer days, and vibrant employee networks that keep you connected to what matters most. Join us in shaping the future.

As an Equal Opportunity Employer, we consider all qualified applicants for all positions without regard to race, creed, color, religion, national origin, ancestry, ethnicity, age, disability, genetic information, sex, sexual orientation, gender identity or expression, citizenship, marital status, domestic partnership or civil union status, familial status, military and veteran status, and other characteristics protected by applicable law.

Discover more information on jobs at StateStreet.com/careers

Read our CEO Statement

]]>

Data Engineer – TV Advertising Data (FAST)

Data Engineer - TV Advertising Data (FAST) Location: London - 3 days onsite Salary £75,000 - £85,000 Neg DOE Reference : J13057 Note: Full and current UK working rights required for this role We're currently seeking a Data Engineer to build the foundations behind the rapidly growing FAST (Free Ad Supports Streaming TV channels) A pioneering opportunity to be involved with direct to consumer advertising for a Global player in the field. Someone who is passionate about how data drives the industry and to help optimise campaigns, measure performance, and monetise content. Key Responsibilities ·Design, build, and maintain scalable ETL/ELT pipelines that transform raw data into reliable, analytics-ready datasets ·Ingest, integrate, and manage new data sources across advertising, audience, platform, and content data within Fremantle's Microsoft Fabric environment ·Deliver robust data flows that underpin global FAST dashboards, monetisation insights, and audience viewing metrics ·Work closely with the central Data & Analytics team to enable high-quality Power BI reporting and analysis ·Ensure strong data governance, integrity, and security across the Azure/Fabric ecosystem ·Optimise data pipelines for performance, scalability, and efficiency, following best-practice engineering standards including version control and code reviews ·Monitor pipeline health, data freshness, and quality, implementing proactive alerting and issue resolution ·Translate business and analytical needs into well-structured data models and technical solutions ·Automate data workflows to minimise manual processes and improve operational reliability ·Maintain clear documentation of pipelines, datasets, and data flows to support collaboration and smooth handovers ·Stay current with data engineering best practices, particularly within the Microsoft technology stack Skills & Experience ·5+ years' experience working as a Data Engineer or in a similar role ·Proven experience with cloud-based data platforms (Azure, AWS, SQL, Snowflake, Springserv); Microsoft Fabric experience is a strong plus ·Strong proficiency in Spark SQL and PySpark, including complex transformations ·Experience building ETL/ELT pipelines using tools such as Azure Data Factory or equivalent ·Ability to write efficient, reusable scripts for transformation, validation, and automation ·Hands-on experience integrating data from APIs (REST, JSON), including automated data collection ·Solid understanding of data modelling best practices for analytics and dashboards ·Confidence working with large, complex datasets across multiple formats (CSV, JSON, Parquet, databases, APIs) ·Strong problem-solving skills and the ability to diagnose and resolve data issues ·Excellent communication skills and experience working with cross-functional teams ·Genuine curiosity about how data drives content performance, audience behaviour, and monetisation If this sounds like the role for you then please apply today

Senior Pricing Analyst

Locations: Peterborough, Manchester, Stoke, Southport, Kent, London, Cambridgeshire (Hybrid/Remote options available)Department: Retail PricingHybrid and largely remote options available

Join a fast-paced, innovative environment where your pricing insights and analytical skills will directly influence strategic decisions and drive profitability across a diverse portfolio of personal lines products.

About the Roles

We’re looking for talented individuals at multiple levels — Senior Analyst, and Principal Analyst — to join our growing Pricing function. Whether you’re deep into data modelling or ready to lead pricing strategies and performance frameworks, we have the right opportunity for you.

Key Responsibilities Include:

Design and optimise pricing solutions aligned to business goals
Develop and maintain performance monitoring frameworks and risk models
Conduct in-depth analysis using predictive modelling to influence pricing decisions
Collaborate with cross-functional teams (Underwriting, Technical Modelling, Data)
Champion innovation, continuous improvement, and pricing best practice
Lead or contribute to strategic initiatives and tactical pricing interventions
Coach and mentor junior analysts

About You

We’re looking for curious, data-driven minds with the following experience:

Proven experience in General Insurance Pricing (Personal Lines preferred)
Strong coding skills in Python, R, SQL, PySpark and SAS
Experience with modelling techniques (GLMs, GBMs, Decision Trees, Neural Nets, Clustering)
Exposure to or expertise in WTW’s Radar or Emblem software
Excellent communication skills — both written and verbal — with a commercial mindset

Leadership candidates will also demonstrate:

Experience leading projects or teams
Ability to shape strategy and drive cross-functional collaboration
A passion for mentoring and developing talent

Why Join Us?

Be part of a collaborative, inclusive team making a tangible business impact
Work in a culture that values innovation and continuous learning
Take advantage of hybrid flexibility and multiple UK office locations
Progress your career through structured development opportunities and mentorship

Principal Pricing Analyst

Job Title: Principal Pricing Analyst

Locations: This can be a largely remote position with the occasion travel to the office closest to you. We have offices based in Manchester, Stoke, London and Peterborough.

Role Overview

Markerstudy Group are looking for a Principal Pricing Analyst to join a quickly growing and developing pricing department across a range of insurance lines.

You will utilise your technical expertise, in-depth knowledge of insurance industry and market leading tools to produce creative and actionable pricing solutions. This role requires a large element of coaching team members and championing best practice across the department.

Reporting to the our Associate Director, you will make use of WTW Radar and Emblem and you will have responsibility for the development and maintenance of predictive models (GLM) and price optimisation including machine learning algorithms (GBM), LTV (Lifetime Value) and fair pricing principles. Ultimately creating value for our customers.

Bringing best in class pricing experience, you’ll be expected to provide pricing proposals considering customer and commercial outcomes, communicating these in a compelling, impactful way to all levels of stakeholders to help us make the right decisions at the right times.

You’ll work on multiple priorities within a fast paced, dynamic environment. You’ll need to be able to manage the expectations of stakeholders alongside prioritising your workload.

As a Principal Pricing Analyst, you will use your advanced analytical skills to:

Be a key stakeholder influencing the direction & outcome of projects
Provide technical leadership on WTW toolkit (in particular Radar Optimiser) to drive forward effective and efficient solutions
Provide thought leadership on optimisation and modelling concepts
Research, develop and champion the use of best practice methods and standards and ensure they are embedded throughout the department
Lead the development of the Groups pricing capability
Query large databases to extract and manipulate data that is fit for purpose
Oversee and assist in the development and implementation of the market leading methodologies you’ve identified
Continuously evaluate methodologies, understanding how they fit into the wider piece, and identify where they can be improved

Key Skills and Experience:

Previous experience within general insurance pricing
Experience with some of the following predictive modelling techniques; Logistic Regression, GBMs, Elastic Net GLMs, GAMs, Decision Trees, Random Forests, Neural Nets and Clustering
Experience in statistical and data science programming languages (e.g. R, Python, PySpark, SAS, SQL)
A quantitative degree (Mathematics, Statistics, Engineering, Physics, Computer Science, Actuarial Science)
Experience of WTW’s Radar software
Proficient at communicating results in a concise manner both verbally and written

About us

Markerstudy Group is a major force in UK general insurance market, combining scale with innovation. Markerstudy Group have deep product and distribution reach through multiple brands and an experienced leadership foundation coordinating diverse and fast-evolving business units. The Group employs more than 6,000 people across the UK.

Lead PySpark Engineer (Cloud Migration)

Role Type: 5-Month Contract

Location: Remote (UK-Based)

Experience Level: Lead / Senior (5+ years PySpark)

Role Overview

We are seeking a Lead PySpark Engineer to drive a large-scale data modernisation project, transitioning legacy data workflows into a high-performance AWS cloud environment. This is a hands-on technical role focused on converting legacy SAS code into production-ready PySpark pipelines within a complex financial services landscape.

Key Responsibilities

Code Conversion: Lead the end-to-end migration of SAS code (Base SAS, Macros, DI Studio) to PySpark using automated tools (SAS2PY) and manual refactoring.
Pipeline Engineering: Design, build, and troubleshoot complex ETL/ELT workflows and data marts on AWS.
Performance Tuning: Optimise Spark workloads for execution efficiency, partitioning, and cost-effectiveness.
Quality Assurance: Implement clean coding principles, modular design, and robust unit/comparative testing to ensure data accuracy throughout the migration.
Engineering Excellence: Maintain Git-based workflows, CI/CD integration, and comprehensive technical documentation.

Technical Requirements

PySpark (P3): 5+ years of hands-on experience writing scalable, production-grade PySpark/Spark SQL.
AWS Data Stack (P3): Strong proficiency in EMR, Glue, S3, Athena, and Glue Workflows.
SAS Knowledge (P1): Solid foundation in SAS to enable the understanding and debugging of legacy logic for conversion.
Data Modeling: Expertise in ETL/ELT, dimensions, facts, SCDs, and data mart architecture.
Engineering Quality: Experience with parameterisation, exception handling, and modular Python design.

Additional Details

Industry: Financial Services experience is highly desirable.
Working Pattern: Fully remote with internal team collaboration days.
Benefits: 33 days holiday entitlement (pro-rata).

Randstad Technologies is acting as an Employment Business in relation to this vacancy.

Data Engineer

Youngs Employment Services

London + 2 or 3 days work from home

Circ £60,000 - £70,000 + Excellent Benefits Package

A fantastic opportunity is available for a Data Engineer that enjoys working in a fast paced and collaborative team playing work environment. Our client has been expanding at a remarkable pace and have transformed their technical landscape with leading edge solutions. Having implemented a new MS Fabric based Data platform, the need is now to scale up and deliver data driven insights and strategies right across the business globally. The Data Engineer will be joining a close-knit team that is the hub of our client’s global data & analytics operation. Previous experience with MS Fabric would be beneficial but is by no means essential. Interested candidates must have experience in a similar role with MS Azure Data Platforms, Synapse, Databricks or other Cloud platforms such as AWS, GCP, Snowflake etc.

Key Responsibilities will include;

* Design, implement, and optimize end-to-end solutions using Fabric components:

* o Data Factory (pipelines, orchestration)

* o Data Engineering (Lakehouse, notebooks, Apache Spark)

* o Data Warehouse (SQL endpoints, schemas, MPP performance tuning)

* o Real-Time Analytics (KQL databases, event ingestion)

* o Manage and enhance OneLake architecture, delta lake tables, security policies, and data governance within Fabric.

* o Build scalable, reusable data assets and engineering patterns that support analytics, reporting, and machine learning workloads.

* Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver effective solutions.

* Troubleshoot and resolve data-related issues in a timely manner.

Key Experience, Skills and Knowledge:

* Proven 2 yrs+ experience as a Data Engineer or similar role, with a strong focus on PySpark, SQL, Microsoft Azure Data platforms and Power BI an advantage

* Proficiency in development languages suitable for intermediate-level data engineers, such as:

* Python / PySpark: Widely used for data manipulation, analysis, and scripting.

* SQL: Essential for querying and managing relational databases.

* Understanding of D365 F&O Data Structures is highly desirable

* Strong problem-solving skills and attention to detail.

* Excellent communication and collaboration abilities.

This is a hybrid role based in Central / West London with the flexibility to work from home 2 or 3 days per week. Salary will be dependent on experience and expected to be in the region of £60,000 - £70,000 + an attractive benefits package including bonus scheme.

For further information, please send your CV to Wayne Young at Young’s Employment Services Ltd. YES are operating as both a recruitment Agency and Recruitment Business

Prospect is looking for someone who is equally passionate about football and analytics and is excited about the possibilities of the intersection of the two. The ideal candidate would have experience as a problem solver, data engineer, and communicator, preferably with a degree in a quantitative field (such as computer science, engineering, physics, statistics or applied mathematics). You’ll work as part of cross-functional teams to help solve challenges and aid decision makers across the sporting landscape, from elite professional teams, to leagues and broadcasters, applying advanced analytics and modelling techniques. Roles & Responsibilities: - A passion for sport with an understanding of our clients’ sporting disciplines or an eagerness to learn about them. - A strong programming proficiency in Python and SQL querying. Experience with relational database platforms. - Knowledge of cloud technologies. It is an advantage (but not a requirement) to have had experience working with AWS. - Excellent collaboration and communication skills. - 4+ years of experience in big data related software development; experience with data modelling, design patterns and building highly scalable and secured solutions. - Practical knowledge of software engineering concepts and best practices, like testing frameworks, packaging, API design, DevOps, DataOps and MLOps. - The right to work in the United Kingdom.

Page 2 of 2

Frequently asked questions

Haystack features a wide range of PySpark job listings including roles such as Data Engineer, Big Data Developer, Data Scientist, and Analytics Engineer working with Apache Spark and PySpark in various industries.

While specific requirements vary by employer, most PySpark job listings require a strong understanding of Apache Spark fundamentals, including working with RDDs, DataFrames, and Spark SQL through PySpark.

Yes, Haystack includes many remote and hybrid PySpark job listings, allowing you to work from anywhere while leveraging your PySpark skills.

To increase your chances, make sure your resume highlights relevant experience with PySpark and big data technologies, tailor your application to each job description, and consider obtaining certifications in Apache Spark or related technologies.

Yes, Haystack lists entry-level PySpark roles for candidates new to the field or transitioning from other data technologies. These listings typically require foundational knowledge of Python and Spark along with eagerness to learn.