Roles
Dunelm
Site Reliability Engineer
Explore roles
This role has expired
Dunelm
Site Reliability Engineer
Multiple locations
Hybrid
Description
Hybrid requirements: This role has flexible working patterns.
This role can be based out of our London or Leicester offices but will be hybrid (i.e. work from home and office).
We’re searching for an Engineer to join our Site Reliability Engineering (SRE) team. The team is agile, data-driven, and motivated, comprising software and systems engineers. We’re dedicated to creating meaningful observability and monitoring solutions, while automating manual tasks, ensuring product quality and forging operational excellence. Our blame-free DevOps culture and collaboration is at the core of our approach.
What you’ll be doing
As a Site Reliability Engineer at Dunelm, you will become a key member of the SRE team. You are motivated and enthusiastic and able to use your operational and engineering knowledge to help develop effective tools, observability solutions, pipelines and more to allow the wider engineering and platform teams at Dunelm the ability to create, update and release with confidence.
Responsibilities:
Observability Development: Designing, building, deploying, running and – ultimately – owning observability tooling, such as dashboards, monitoring and alerting.
Embedded Consultancy: Working with other teams throughout the Dunelm technical space to help increase their SRE maturity level – mainly through helping to define Service Level Objectives (SLO) and Service Level Indicators (SLI), plus working on the integrations to help them produce the required telemetry for them.
DevOps Best-Practice Advocacy: Promoting a DevOps culture, through ‘shift-left’ testing, non-functional (security, performance etc.) testing, continuous integration and deployment and working with other teams to share the responsibilities of the software that is built.
Incident Response: Being available to help investigate incidents in real-time – sometimes out of normal working hours as part of our on-call rota. Helping to ascertain impact and find observability gaps during these investigations. Being part of ‘blameless post-mortems’, focusing on collaboration and knowledge-sharing.
Workflow: Helping to create and refine work tickets, breaking down larger pieces of work into actionable pieces. Ensuring relevant knowledge is shared with the rest of the team while working on these tickets and clearly articulating any blocking circumstances.
Code Quality and Risk Mitigation: Review the team’s output to ensure all code is highly maintainable, supportable, and minimises operational risk.
Mentorship and Coaching: Mentor and guide other team members, including less experienced engineers, providing feedback and coaching to help them reach their full potential.
Research and Learning: Researching new technologies and architectural patterns by conducting technical Proof of Concepts (PoC) and propose improvements to existing platforms as well as developing new solutions. You will also be given the opportunity to do team-based and independent learning on a regular basis, to improve yours and the team’s knowledge.
What we’ll look for in you
Essential skills
Amazon Web Services: We run most of our back-end software in AWS Lambdas, with some containerised software (ECS / Fargate) and some cloud server based (EC2). You will need experience and good knowledge of all of these, general AWS networking principles (VPC, security groups etc.), plus other AWS services including, but not limited to: S3, EventBridge, SQS / SNS, RDS and DynamoDB.
Development Experience: You will be a solid developer, experienced in building high-quality, testable applications and tools. You will know how to create effective tests (unit and integration) and be familiar and comfortable with different ways of tackling a problem – for example pair and mob programming. Our stack is mainly TypeScript and Python, so experience with both would be distinctly advantageous.
Observability Knowledge: The fundamental aspects of observability, including telemetry and RUM are something you can explain in detail, and you know how to use them to effectively observe running software. You also understand sampling and how that can be used most effectively.
Problem-Solving Prowess: You possess exceptional problem-solving skills, capable of addressing intricate challenges that may arise within our technology landscape. You are also a ‘detective’, using your skills and knowledge to collect evidence that eliminates what a problem is not, leading you to the most likely cause(s).
Pipeline Expertise: You will understand how to create, deploy and troubleshoot CI / CD pipelines (we use Gitlab) to run tests / checks, create builds and ultimately deploy software in various environments.
Technology Proficiency: You have solid knowledge of various technologies, tools, frameworks and patterns related to the previous five points, including, but not limited to: IaC (e.g. Terraform, Pulumi, CDK), Lambda runtime programming languages (e.g. TypeScript, Python), containerised applications (Docker), event-driven architecture and POSIX-based shells (e.g. Bash, zsh).
Tech Enthusiasm: Your passion for technology drives you to explore and embrace the latest innovations continuously. This dedication to growth and learning is essential in staying ahead in our ever-evolving tech landscape.
Desirable skills
OpenTelemetry: Previous experience or demonstrated knowledge of working with OpenTelemetry solidifies your observability expertise, enhancing our monitoring and diagnostic capabilities.
Google Cloud Platform (GCP): Although Dunelm’s distributed systems are overwhelmingly deployed on AWS, we do have strategic deployments in GCP, so any working knowledge of this platform would demonstrate your breadth of cloud knowledge.
Role tech stack
aws-lambda
google-cloud-platform
typescript
python
terraform
docker
bash
Life at
Dunelm
Browse all roles
Culture overview
We're here to help our customers create the joy of feeling truly at home. Join us and you'll find our caring and inclusive culture makes this a place you'll feel right at home too. Learn Wherever you work with us and in whatever role, you'll have every opportunity to keep on learning and keep on growing. Thrive We'll take care of you, and make sure your everyday needs are met, so you can focus on doing a great job and being the best version of you. Belong We embrace diversity in all its forms. We'll celebrate the individual you are and value the unique contribution you bring. Colleague Networks All of our colleagues have the opportunity to be part of our four colleague networks. These are Disability & Neurodiversity, LGBTQ+, Gender Equality and Ethnicity & Race. Each network has co-chairs and an exec sponsor who work closely with us to ensure that we are a workplace where everyone feels supported, celebrated, valued and heard. A chance to give something back We're serious about our role in society. Each of our stores is partnered with a local charity and has its own community Facebook page. And we offer our Pausa Cafés for free to local community groups. We're also proud partners of the mental health charities, Mind (UK and Wales), SAMH (Scotland) and Inspire (Northern Ireland). And each year, we'll give you a day's paid leave to support a charity that matters to you. Work your way We have adapted our ways of working to make sure everyone can feel at home wherever they work. For many colleagues at our Head Office in Leicester and our Central London hub that now includes working on a hybrid basis, combining days in the office with time spent working at home or elsewhere across the business.
Employee benefits
Bonus Scheme
Childcare Vouchers
Flexible Working
Free Parking
Laptop
Learning Allowance
Life Insurance
Pension
Private Healthcare
Share Options
Wellbeing Programme
Office vibe
Birthday Off
City Centre
Hackathons
Office Dog
Open Plan
Social Events
Location
Tech at
Dunelm
Go to profile
Leadership
John Gahagan
Chief Technology and Information Officer
Tech overview
Our Tech, Digital and Data teams are transforming literally every aspect of our business – from the way we manage and make use of our data, to the relationships we share with our customers. Already, their impact has been felt across the business, and indeed by our customers. But this is just the start and we know there are bigger opportunities ahead. Check out our tech blog for tales behind our talented teams: https://engineering.dunelm.com/ Keep on growing Join us on the tech side and you'll have access to a huge array of learning and development opportunities, including a variety of internally created workshops and externally accredited courses. We also have a substantial tech-specific budget to fund e-Learning licenses, conference visits, resources, and qualifications, plus dedicated mentors, well-being buddies and a wide range of network groups to support you as you progress.
Engineering principles
Agile Process
Code Reviews
Communication and collaboration
Continuous delivery
Continuous Development
Continuous integration
Infrastructure as code
Mentoring
Micro services
Pair programming
Scrum
Test Driven Development
Unit testing
Company tech stack
javascript
aws-lambda
graphql
react
typescript
jest
nodejs
sql
python
java
Dunelm
Site Reliability Engineer
Leicester
This role has expired