Job Title: Site Reliability Engineer
Salary: £90k Equity
Company Description: EQUALS - World’s largest social music network
Job Description: As the sole SRE, you will own the infrastructure powering a social network of over a million users. You will manage a high-scale AWS environment using Pulumi, optimize PostgreSQL 17 for massive datasets, and ensure system reliability during exponential growth. This is a high-impact role where you control the platform’s operational health.
Location: London, UK
Why this role is remarkable:
- Take full autonomy as the sole infrastructure owner for a platform with 1M+ users and exponential monthly growth.
- Tackle massive scale challenges, including music catalog ingestions of over 1 billion rows and real-time chat infrastructure.
- Enjoy a high-impact environment with direct replacement status for a departing senior lead and a significant £100k+ equity package.
What you will do:
- Manage and evolve AWS infrastructure via Pulumi (TypeScript), covering ECS/Fargate, RDS, ElastiCache, and Lambda.
- Own the monitoring and observability stack using Datadog APM to reduce alert fatigue and lead incident response.
- Optimize data pipelines and performance, including Airbyte replication and PostgreSQL tuning for high-concurrency music streaming.
The ideal candidate:
- Deep expertise in AWS (ECS/Fargate, RDS, SQS) and Infrastructure-as-Code using Pulumi, Terraform, or CDK.
- Strong PostgreSQL and Redis knowledge, specifically regarding performance tuning, indexing, and connection pooling at scale.
- Proven experience in incident response and CI/CD management, with the ability to stay calm and diagnose issues in production.