The Site Reliability Engineering team designs and builds the global infrastructure on which we deploy our services, focusing on the flagship MongoDB Atlas platform. As our customers grow and globalize, our services must satisfy demands for low latency requests around the globe, and comply with various data sovereignty requirements. The SRE team's mission is to continuously lower the operational burden associated with this complex infrastructure, increasing internal visibility into system health.
We are looking for candidates based in Dublin for our hybrid working model.
Responsibilities
- Design and build the infrastructure for a global cloud service that comprises hundreds of thousands of MongoDB clusters, processes a billion metrics per day, and replicates tens of billions of database writes to our backup service.
- Design, implement, and troubleshoot automation and monitoring of services that seamlessly span the globe, including multiple cloud providers.
- Become an expert in infrastructure performance, optimizing from the application level all the way through the firmware.
- Build for resilience; participate in a weekly on call rotation to keep pager incidents to zero.
- Improve infrastructure capabilities, optimizing for cost, simplicity, and maintainability.
Requirements
- 3+ years of experience running a mission critical service at scale in a Linux environment.
- Firm grasp of at least one modern programming language beyond basic scripting.
- Familiarity with web and network protocols and standards (HTTP, TLS, DNS, etc).
- Bachelor's degree in Computer Science or equivalent experience.
- Experience writing automation tools and eagerness to "automate all the things".
Nice to Have
- Experience building large applications from scratch, including CI/CD infrastructure.
- Experience in networking, security, hardware or OS performance tuning.
- Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure).
- Experience managing Kubernetes clusters or other container orchestration infrastructure.
- Experience with observability of large scale distributed systems.
MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.
MongoDB is an equal opportunities employer.