Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote Charles Simon Associates are currently recruiting for an SRE Engineer on a permanent basis. This role is for a global business with a HQ in the City of London. Candidates will need to be British Citizens due to Security Clearance requirements. Location: Remote, with some travel to London Salary: Up to £125,000 per annum Skills/Requirements for the Site Reliability Engineer: \* Extensive SRE experience within previous roles \* Strong Terraform skills \* Proven Kubernetes and AKS experience \* Experience in creating and modifying terraform deployment on live environments \* Experience with Monitoring solutions ideally Datadog, however Azure Application Insight, Log Analytics or Grafana \* Scripting skills for automation within; PowerShell, Python or Bash \* Experience with web based applications Desirable Skills: \* Knowledge or commercial experience of Microservices Architecture \* Kanban \* Any prior experience of working with Puppet and Chef would be advantageous Start date is ASAP for the Site Reliability Engineer The Site Reliability Engineer will be responsible for: \* Designing and enforcing service-level objectives (SLOs), SLIs, and SLAs to ensure reliability targets are measurable and aligned with business expectations \* Implementing incident response frameworks, including runbooks, postmortems, and blameless RCA processes to drive continuous improvement \* Integrating observability tooling (e.g. Prometheus, Grafana, Datadog, OpenTelemetry) to enable proactive detection and resolution of system anomalies \* Managing infrastructure as code (IaC) using tools like Terraform, Pulumi, or CloudFormation to ensure repeatable, auditable deployments \* Optimizing cost and resource utilization across cloud environments through rightsizing, autoscaling, and lifecycle policies \* Driving chaos engineering initiatives to test system resilience under failure conditions and validate recovery strategies \* Championing security best practices within infrastructure—e.g. secrets management, IAM policies, and vulnerability scanning \* Collaborating with DevOps and platform teams to build paved-road deployment patterns and internal developer portals \* Leading capacity planning and load testing efforts to anticipate scaling needs and prevent bottlenecks \* Contributing to architectural decisions that impact reliability, latency, and fault domains across distributed systems Please send an up-to-date copy of your CV to be considered for the Site Reliability Engineer Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote