Design and automate the deployment of Cloudera CDP components (Data Lake, Data Hubs, Data Services) using Terraform.
Build and maintain CI/CD pipelines using GitHub Actions for infrastructure and data pipeline automation
Collaborate with data engineering teams to integrate CDP with existing Big Data workflows.
Write and maintain automation scripts using Shell and Python.
Manage infrastructure and configuration using YAML and JSON.
Configuring and tuning Cloudera services for performance and security.
Ensure secure, scalable, and cost-effective deployments on cloud platforms (AWS, Azure, or GCP).
An understanding of all the Hadoop daemons along with their roles and responsibilities in the cluster.
Should be able to troubleshoot issues in Cloudera services and fix those.
Adding and removing nodes in the cluster.
Rebalancing nodes in the cluster.
Employ security using an authentication and authorization system such as Kerberos.
Knowledge of changes required for migrating to Cloudera cloud version CDP
Knowledge of Cloudera data services (CDW, CDE, CDF, CAI)
Design Native Cloud Application Architectures and optimize applications for AWS
Network connectivity, Direct Connect, VPN, VPC, Security group, NACL, Route 53
Must have in depth AWS development experience (Containerization - Glue, Docker, Amazon EKS, Lambda, EC2, S3, Amazon DocumentDB, PostgreSQL)
Strong knowledge of DevOps and CI/CD pipeline (GitHub, Jenkins)
Scripting capability and the ability to develop AWS environments as code
Hands-on AWS experience with at least 1 implementation (preferred in an Enterprise scale environment)
Experience with core AWS platform architecture, including areas such as: Organizations, Account Design, VPC, Subnet, segmentation strategies.
Environment and application automation
CloudFormation and third-party automation approach/strategy
AWS Cost Management and Optimization
Extensive experience with Cloudera cloud data platform (CDP) and CDP Services and Big data knowledge.
Proficiency in Terraform for infrastructure as code (IaC).
Strong hands-on experience with Cloudera CDP and Hadoop ecosystem (Hive, Impala, HDFS, etc.)
Experience with GitHub Actions or similar CI/CD tools (eg, Jenkins, GitLab CI).
Solid Scripting skills in Shell and Python.
Extensive experience in designing, provisioning, deploying and configurtion of the Cloudera cluster based on customer’s need
Extensive experience in AWS service - EC2, VPC, ELB, S3, RDS, Lambda, Route 53 etc and should be able to design and deploy cloudera cluster on AWS cloud
Strong knowledge of data processing using Cloudera services
Good understanding of CI/CD concepts, version control, and DevOps best practices.