LLM Inference & Deployment Engineer (Air-Gapped Environments)
3 month + contract
Inside IR35
Hybrid working (2-3 days in London)
You've deployed 70B parameter models on GPU clusters with no Internet access. You know the difference between a model that works in a notebook and one that runs reliably in production under compliance scrutiny. If that's your world, we want to talk.
This is a genuinely specialist role. The platform you'll be working on runs multiple large-scale LLMs concurrently, frontier models for text screening, code LLMs for analysis, transformer encoders for classification, all in an air-gapped environment with a fixed compute budget and zero external API access.
You'll own the inference infrastructure end-to-end: GPU allocation strategy, quantisation decisions, batching, determinism controls, and offline deployment packaging. The system has to be fast, reliable, and auditable. That's a rare combination of skills and this role is for someone who has genuinely done it before.
What we're looking for
If you are available and interested in this new role please send a current CV.