Harvey Nash Plc logo

SRE Data and AI

Harvey Nash Plc
1 hour ago
Contract
Lanarkshire
United Kingdom

SRE - Data & AI | 6 Month Contract | (Inside IR35) | Hybrid, Glasgow (3 days pw)| Starting ASAP

Day Rate: £DOE

About the Role:

Provides a reliable, secure and agile technical platform for the client's business applications and operations. ETS enables business operations and delivers collaboration and productivity tools to our internal and external clients. ETS provides production management, quality assurance, and end user services for Institutional Securities and Support Services and delivers first-line defences to manage the IT risks to the Firm, address the evolving cyber threat landscape, and meet regulatory expectations. ETS also works with Investment Banking to manage the Firm's strategic relationships with the technology community, including venture capitalists, established technology companies, and start-ups.

Main Duties

The role focuses on building and operating highly reliable services through automation, strong observability, disciplined incident and change management, and continuous improvement. You will support the full life cycle of the platform-from data ingestion and cloud data workloads through to semantic layers and AI-driven capabilities.

You will play a key role in production readiness, incident response, release coordination, and operational governance, while contributing to the evolution of the platform as it scales to support new datasets, users, and AI-powered use cases. This role requires a strong engineering mindset, comfort operating complex distributed systems, and a passion for building platforms that teams can trust

  • Own the reliability, availability, and performance of large-scale data and analytics platforms across DEV, QA, and PROD environments.
  • Apply Site Reliability Engineering (SRE) principles to design and operate resilient services, including defining SLIs/SLOs and driving continuous reliability improvements.
  • Drive an automation-first approach, developing tooling and workflows (primarily in Python) to reduce operational toil and improve repeatability.
  • Build, operate, and enhance CI/CD pipelines supporting data pipelines, cloud data platform assets, semantic models, and platform services.
  • Coordinate and support release and change management, ensuring safe deployments, appropriate validation, and rollback readiness.
  • Act as a senior escalation point for incident response, contributing to root cause analysis, problem management, and preventative remediation.
  • Design and maintain monitoring, alerting, and observability for platform components, ingestion pipelines, and cloud data workloads.
  • Operate and optimize cloud-based data platforms (including Snowflake), ensuring stability, scalability, and cost-effective operation.
  • Support the deployment, reliability, and operational readiness of AI-enabled services and agents, including environment promotion, monitoring, failure handling, and runbook development.
  • Partner with engineering and product teams to ensure AI-driven capabilities are production-ready, observable, and safely integrated into the wider platform ecosystem.
  • Collaborate closely with engineering, data, and product teams in an Agile delivery environment.
  • Contribute to operational documentation, runbooks, and the continuous evolution of the platform operating model.

Essential Skills & Experience:

  • 5+ years of experience in SRE, production engineering, platform engineering, or a related role.
  • Strong experience with automation and Scripting, particularly Python.
  • Hands-on experience operating and supporting production systems at scale.
  • Experience with CI/CD pipelines and modern software delivery practices.
  • Solid understanding of change and incident management processes in enterprise environments.
  • Experience working with cloud-based data platforms, including Snowflake
  • Strong troubleshooting skills across application, platform, and data layers.
  • Working knowledge of relational database systems (eg, PostgreSQL, MySQL, Oracle, SQL Server or equivalent)
  • Excellent communication skills and ability to collaborate across multiple teams.

Technical Competencies

  • Python (automation, tooling, operational scripts)
  • CI/CD tooling (eg, Git-based workflows, pipeline automation)
  • Monitoring, alerting, and observability patterns
  • Cloud-native or modern data platform operations
  • Infrastructure-as-code and configuration management concepts
  • Unix/Linux environments and basic networking concepts

Desirable Qualifications

  • Experience with semantic data modelling and analytics-oriented data platforms.
  • Exposure to Snowflake Cortex or other GenAI/LLM-based tools in production or near-production environments.
  • Familiarity with data visualization and BI tools such as Tableau, Power BI, or equivalent, and how they are used to consume curated semantic layers.
  • Experience working in Agile delivery environments, including participation in sprint planning, backlog refinement, and iterative delivery alongside engineering and product teams.
  • Familiarity with operational data domains (ITSM, incidents, change, alerts).

This role has been deemed Inside IR35 by the client. Applicants must hold, or be happy to apply for, a valid Basic Disclosure Scotland. Please click the link to apply.