About the role
Own the infrastructure for running local LLMs at scale — GPU clusters, inference servers, and AI-optimized deployment pipelines.
You'll ensure our self-hosted AI systems are fast, reliable, and cost-efficient across multiple client environments — both in our SG/VN data centres and inside customer VPCs.
This role sits at the intersection of MLOps, classic SRE, and platform engineering. You ship Helm charts, Terraform, and observability dashboards rather than ad-hoc scripts.
What we're looking for
- 3+ years operating production Kubernetes
- Experience with vLLM, TGI, Triton, or comparable LLM-serving stacks
- Solid Linux + networking fundamentals; Terraform / Pulumi a plus
- You believe a runbook is part of the deliverable