Job Title: Senior Software Engineer – SRE Automation
Location: Remote (U.S.-based)
Employment Type: Contract (40 hours/week)
Pay Rate: $85 p/Hr
Duration: 6+ months with potential to extend
Introduction
Are you a passionate software engineer with deep experience in automation, observability, and building scalable systems? The Planet Group is seeking a Senior Software Engineer to join a high-impact Site Reliability Engineering (SRE) team within a rapidly scaling healthcare technology environment. This role will focus on automating critical observability and reliability tools, collaborating closely with SRE, infrastructure, and development teams to enhance operational efficiency and ensure the uptime and performance of core applications.
This is a highly technical and visible role best suited for engineers who thrive on ownership, modern cloud-native architecture, and infrastructure as code practices.
Required Skills & Qualifications
- Bachelor’s degree in Computer Science or related field (or equivalent professional experience)
- 4+ years of professional software development experience
- Proven expertise in Python and building scalable web services using FastAPI
- Deep understanding of API design, lifecycle, and developer experience
- Hands-on experience with AWS, including services like EC2, S3, RDS, Lambda, EKS, Redis, IAM, and CloudWatch
- Experience with Docker and supporting applications in Kubernetes environments
- Strong understanding of relational databases (PostgreSQL, MySQL)
- Experience designing and maintaining CI/CD workflows using GitHub Actions
- Strong collaboration and communication skills in cross-functional, global teams
- Agile software development experience (Scrum, Kanban, etc.)
Preferred Skills & Qualifications
- Experience with Terraform or CloudFormation for infrastructure as code
- Prior experience in SRE or DevOps-focused roles
- Familiarity with Datadog observability tools and automation for SLI/SLO dashboards and alerts
- Experience with AI-powered coding assistants like GitHub Copilot
- Bonus: Familiarity with modern web front-end frameworks like React, TypeScript, or JavaScript
- Bonus: Hands-on experience with Chaos Engineering or AWS Fault Injection Simulator
Day-to-Day Responsibilities
- Develop and maintain automation tools to support SRE observability, reliability, and incident response workflows
- Build and manage scalable web services, APIs, and microservices using Python and FastAPI
- Automate the creation and management of Datadog dashboards, monitors, and alerts
- Work closely with SRE teams to implement and scale SLI/SLO tracking
- Design and optimize data schemas, queries, and pipelines
- Maintain CI/CD pipelines in GitHub Actions to enable continuous delivery and testing
- Participate in design reviews, contribute to documentation, and uphold high-quality engineering standards
- Collaborate across infrastructure, engineering, and product teams to ensure system resilience and observability
Company Benefits & Culture
- Join a mission-driven, high-impact team working on cutting-edge healthcare technology
- Collaborate in a supportive, fast-moving, and globally distributed engineering environment
- Work fully remote with modern development tooling and processes
- Opportunity to contribute to systems that support millions of users globally
Apply today if you’re ready to build powerful automation tools that drive real-world impact—and help scale reliability across a global healthcare platform.
#LI-CW1 #TECH #Remote