Job Description Job Description
Senior Observability Engineer
No 3rd Party, no C2C!
About the role
We are seeking a Senior Observability Engineer to help design, implement, and mature our observability capabilities across modern cloud-based environments. In this role, you will focus on building scalable visibility into application and infrastructure health using New Relic, enabling teams to monitor performance, detect issues earlier, and respond with confidence. You will play a key role in shaping observability strategy, improving telemetry coverage across distributed systems, and creating practical standards for metrics, logs, and tracing. This is a 6-month contract role based onsite in Plano, TX, with the possibility of conversion.
What you'll be doing
- Architect and scale observability solutions using New Relic, including APM, Infrastructure, Logs, Synthetics, and NRQL, to support reliable monitoring across systems and services.
- Design and build dashboards, visualizations, and alerting workflows that provide real-time insight into system behavior, service health, and operational performance.
- Automate observability deployment and configuration through Terraform, OpenTofu, or similar infrastructure-as-code tools to improve consistency and repeatability.
- Analyze telemetry data to identify root causes, troubleshoot incidents, and uncover opportunities to improve reliability, stability, and performance.
- Partner closely with engineering teams to define and implement observability best practices for metrics, logs, and distributed tracing across applications and platforms.
- Support visibility across cloud and containerized environments, helping teams better understand dependencies, service interactions, and production behavior.
- Contribute to a proactive monitoring approach by refining alert quality and helping reduce noise while improving signal for operational teams.
What we're looking for
- Deep hands-on expertise with observability practices and a strong understanding of how to build meaningful monitoring for complex, distributed systems.
- Strong New Relic experience, including the ability to work confidently with NRQL, APM, alerting, dashboards, and related platform capabilities.
- A collaborative mindset with the ability to work effectively across engineering teams and influence adoption of observability standards and best practices.
- Strong problem-solving and troubleshooting skills, with a focus on using telemetry data to investigate incidents and improve system performance.
- Experience working in cloud-native environments, along with a practical understanding of how observability supports reliability in modern infrastructure.
- Comfort with automation and infrastructure as code, especially in environments where repeatability and scalability are important.
- Clear communication skills and the ability to translate technical findings into actionable recommendations for stakeholders and partner teams.
What you'll need
- 7+ years of experience in Cloud, SRE, DevOps, or Observability roles.
- 3+ years of hands-on experience with New Relic, including NRQL, APM, alerting, and dashboard development.
- Strong experience with Terraform or other infrastructure-as-code tools.
- Cloud platform expertise in AWS (preferred), Azure, or GCP.
- Experience with scripting in Python, Go, or Bash.
- Solid understanding of microservices, containers, Docker, Kubernetes, and CI/CD practices.
- A strong troubleshooting and performance optimization mindset.
What we offer
- 6-month contract opportunity.
- Possible conversion based on business needs and fit.
- Onsite work environment in Plano, TX.
- Sick days provided.
- The opportunity to work on high-impact observability initiatives that support system visibility, reliability, and engineering effectiveness.