About the Role We're looking for a Senior Site Reliability Engineer to help scale our observability practices, improve system reliability, and drive high availability across our technology platform. You’ll work closely with engineering and platform teams, combining software engineering skills with systems thinking to build reliable, observable, and efficient services. What You’ll Do - Lead Observability: Evolve how we collect and process telemetry using OpenTelemetry. Focus on actionable, cost-effective metrics and traces. - Define Reliability Standards: Help teams adopt SLIs/SLOs and build a culture of service ownership. - Incident Response: Take charge during incidents, lead post-incident reviews, and drive long-term fixes. - Infrastructure Automation: Use tools like Pulumi, Terraform, and CDK to manage AWS infrastructure and CI/CD pipelines. - Write Code: Build tooling and automation using TypeScript (and optionally Rust). Contribute to shared libraries and platforms. - Mentor and Collaborate: Guide other engineers, share knowledge, and help grow the SRE mindset. - Drive Innovation: Explore new tools and methods in observability and reliability. Lead proof-of-concepts and continuous improvement. What You’ll Bring Essential: - Strong experience in TypeScript or similar languages. - Solid understanding of OpenTelemetry and modern observability tools (e.g. Datadog, Grafana). - Deep knowledge of SRE principles: SLIs/SLOs, automation, monitoring, and incident response. - Hands-on experience with AWS (e.g. Lambda, ECS, S3, DynamoDB). - Comfortable with Linux, command line, and system debugging. - Experience with infra-as-code tools like Pulumi or Terraform. - Familiar with Kubernetes, CI/CD pipelines, and automated deployments. - Strong problem-solving skills and attention to detail. Desirable - Experience with Rust or Go. - Deep understanding of trace sampling and OpenTelemetry at scale. - Experience reducing observability/cloud costs. - Exposure to Google Cloud Platform. - Familiarity with retail tech challenges is a bonus. How You Work You align with our core values: - Take ownership and build trust. - Communicate openly and support your team. - Stay curious and always look to improve. - Embrace change and challenges. - Think long-term and work collaboratively.

