Use your on-call shift to prevent incidents from ever happening.
Run our infrastructure with etcD, Envoy, Ansible, and OpenShift.
Make monitoring and alerting alert on symptoms and not on outages.
Document every action so your findings turn into repeatable actions–and then into automation.
Improve the deployment process to make it intuitive and fast for the engineering teams.
Design, build, and maintain core infrastructure pieces that allow One Data scaling to support hundreds of thousands of concurrent users.
Debug production issues across services and levels of the stack.
Plan the growth of One Data platform infrastructure.
Coding infrastructure automation with Terraform
Coding/Enhancing our internal CLI tool using Go Lang.
Improving our Prometheus Monitoring or building new Metrics
Helping platform teams deploy and fix new versions of One Data
Plan, prepare for, and execute the migration of our core platform tools from virtual machines to cloud-native container-based deployments with Kubernetes on OpenShift platform.
Plan and prepare for a multi-hybrid cloud strategy for scaling One Data platform.
Think about systems - edge cases, failure modes, behaviors, specific implementations.
Know your way around Linux and the Unix Shell and have strong shell scripting skills.
Have strong programming skills - using Java, Kotlin, Node, or Go
Have an urge to collaborate and communicate asynchronously.
Have an urge to document all the things so you don’t need to learn the same thing twice.
Have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it.
Have an urge for delivering quickly and iterating fast.
Have experience with Nginx, HAProxy, Envoy, Docker, Kubernetes, Terraform, or similar technologies
Have strong SCM skills using Git
Have strong skills in using either Splunk, Elastic, DataDog, or similar log management tools
Have a good understanding of the JVM internals and are passionate to dig deeper to access heap dumps or thread dumps using JFR, VisualVM, or similar toolchains.
Have a good understanding of analytics and monitoring and experience working with Prometheus and Grafana
We back our colleagues and their loved ones with benefits and programs that support their holistic well-being. That means we prioritize their physical, financial, and mental health through each stage of life.
Competitive base salaries
Support for financial well-being and retirement
Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location)