Requirements
- Familiar in distributed systems, with at least 2+ years of backend engineering experience (Java, Go, or similar)
- Familiar with service-oriented architectures (SOA) and inter-service communication using gRPC or REST
- Good understanding of AWS (or other hyperscalers), cloud networking, and infrastructure-as-code patterns
- Comfortable working in high-scale environments with millions of events and alerts per day
- A strong communicator, collaborator, and a problem solver
What the job involves
- The database market is massive and MongoDB is at the forefront of its disruption. At MongoDB, we are transforming how developers build and run applications. Our distributed systems power mission-critical services used by thousands of customers around the globe
- We are looking for a Software Engineer to join the Cloud Core Alerts Platform team, building scalable, fault-tolerant, and highly available systems that process millions of events and alerts in real time
- Alert Streaming Systems: Real-time alerting pipelines using Apache Flink and Amazon Kinesis, delivering critical insights for MongoDB Atlas customers
- Event Systems: SOA-based event platforms leveraging gRPC and streaming architectures to power the MongoDB activity feed at scale
- Communication Systems: Distributed services to deliver alert notifications through multiple channels (email, SMS, Slack, PagerDuty) with reliability and resilience
- Third-Party Integrations: Securely managing customer credentials and integrating with external observability providers, ensuring encryption at rest and in transit
- Cross-team Collaboration: Partnering with the broader Customer Observability teams to provide unified telemetry and monitoring experiences
- Design and build distributed systems that process millions of events per second with high availability and low latency
- Lead end-to-end projects from design to production, ensuring scalability, observability, and operational excellence
- Evolve our streaming alerting and eventing platforms, improving reliability, throughput, and developer experience
- Collaborate with cross-functional teams to integrate alerting, event, and communication services into the broader customer observability ecosystem
- Develop secure, multi-tenant integrations with third-party providers, handling sensitive customer credentials safely
- Champion best practices in distributed systems design, including resilience, scalability, and fault tolerance
- Mentor and guide other engineers, sharing expertise on distributed architectures and streaming technologies
- Technologies we use:
- Languages & Frameworks: Java
- Streaming & Messaging: Apache Flink, Amazon Kinesis, Kafka
- Service Frameworks: gRPC, Protobuf, REST
- Datastores: MongoDB, S3
- Cloud & Infrastructure: AWS, Kubernetes, Terraform