Skill Profile
- PySpark - Advanced (P3)
- AWS - Advanced (P3)
- SAS - Foundational (P1)
Key Responsibilities Technical Delivery
- Design, develop, and maintain complex PySpark solutions for ETL/ELT and data mart workloads.
- Convert and refactor Legacy SAS code into optimized PySpark solutions using automated tooling and manual refactoring techniques.
- Build scalable, maintainable, and production-ready data pipelines.
- Modernize Legacy data workflows into cloud-native architectures.
- Ensure data accuracy, quality, integrity, and reliability across transformation processes.
Cloud & Data Engineering (AWS-Focused)
- Develop and deploy data pipelines using AWS services such as EMR, Glue, S3, and Athena.
- Optimize Spark workloads for performance, scalability, partitioning strategy, and cost efficiency.
- Implement CI/CD pipelines and Git-based version control for automated deployment.
- Collaborate with architects, engineers, and business stakeholders to deliver high-quality cloud data solutions.
Core Technical Skills PySpark & Data Engineering
-
5+ years of hands-on PySpark experience (Advanced level).
-
Strong ability to write production-grade, maintainable data engineering code.
-
Solid understanding of:
- ETL/ELT design patterns
- Data modelling concepts
- Fact and dimension modelling
- Data marts
- Slowly Changing Dimensions (SCDs)
Spark Performance & Optimization
- Expertise in Spark execution planning, partitioning strategies, and performance tuning.
- Experience troubleshooting distributed data pipelines at scale.
Python & Engineering Quality
-
Strong Python programming skills with emphasis on clean, modular, and maintainable code.
-
Experience applying engineering best practices including:
- Parameterization
- Configuration management
- Structured logging
- Exception handling
- Modular design principles
SAS & Legacy Analytics (Foundational)
- Working knowledge of Base SAS, Macros, and DI Studio.
- Ability to interpret and analyze Legacy SAS code for migration to PySpark.
Data Engineering & Testing
- Understanding of end-to-end data flows, orchestration frameworks, pipelines, and change data capture (CDC).
- Experience creating ETL test cases, unit tests, and data comparison/validation frameworks.
Engineering Practices
- Proficient in Git workflows, branching strategies, pull requests, and code reviews.
- Ability to document technical decisions, architecture, and data flows.
- Experience with CI/CD tooling for data engineering pipelines.
AWS & Platform Expertise (Advanced)
Strong hands-on experience with:
- Amazon S3
- EMR and AWS Glue
- Glue Workflows
- Amazon Athena
- IAM
- Solid understanding of distributed computing and big data processing in AWS environments.
- Experience deploying and operating large-scale data pipelines in the cloud.
Desirable Experience
- Experience within banking, financial services, or other regulated industries.
- Background in SAS modernization or cloud migration programs.
- Familiarity with DevOps practices and infrastructure-as-code tools such as Terraform or CloudFormation.
- Experience working in Agile or Scrum delivery environments.