Skip to content

Senior AI DevOps / LLMOps

  • Remote
    • Krakow, Podlaskie, Poland
    • Krosno, Dolnośląskie, Poland
    • Lublin, Lubelskie, Poland
    • Poland, Mazowieckie, Poland
    • Poznan, Wielkopolskie, Poland
    • Warsaw, Warmińsko-Mazurskie, Poland
    • Pilsen, Plzeňský kraj, Czechia
    • Praha, Praha, Hlavní město, Czechia
    • Riga, Rīga, Latvia
    • Vilnius, Vilniaus apskritis, Lithuania
    • Athens, Attikí, Greece
    • Crete, Kríti, Greece
    • Budapest, Budapest, Hungary
    • Miskolc, Borsod-Abaúj-Zemplén, Hungary
    • Brussels, Brussels, Belgium
    • Liège, Walloon Region, Belgium
    • Mons, Walloon Region, Belgium
    • Namur, Walloon Region, Belgium
    • Sofia, Sofia, Bulgaria
    • Bordeaux, Nouvelle-Aquitaine, France
    • Grenoble, Auvergne-Rhône-Alpes, France
    • Lille, Hauts-de-France, France
    • Lyon, Auvergne-Rhône-Alpes, France
    • Montpellier, Occitanie, France
    • Nantes, Pays-de-la-Loire, France
    • Paris, Île-de-France, France
    • Toulouse, Occitanie, France
    • Baden-Baden, Baden-Württemberg, Germany
    • Bad Homburg, Hessen, Germany
    • Augsburg, Bayern, Germany
    • Aachen, Nordrhein-Westfalen, Germany
    • Bergisch Gladbach, Nordrhein-Westfalen, Germany
    • Berlin, Berlin, Germany
    • Bernau, Brandenburg, Germany
    • Bielefeld, Nordrhein-Westfalen, Germany
    • Bramsche, Niedersachsen, Germany
    • Chemnitz, Sachsen, Germany
    • Darmstadt, Hessen, Germany
    • Darmstadt, Hessen, Germany
    • Dresden, Sachsen, Germany
    • Dresden, Sachsen, Germany
    • Dusseldorf, Nordrhein-Westfalen, Germany
    • Düsseldorf, Rheinland-Pfalz, Germany
    • Essen, Nordrhein-Westfalen, Germany
    • Frankfurt, Hessen, Germany
    • Freiburg, Baden-Württemberg, Germany
    • Geoergsmarienhütte, Niedersachsen, Germany
    • Görlizt, Sachsen, Germany
    • Göttingen, Niedersachsen, Germany
    • Greifswald, Mecklenburg-Vorpommern, Germany
    • Halle, Sachsen-Anhalt, Germany
    • Hamburg, Hamburg, Germany
    • Hannover, Niedersachsen, Germany
    • Heinsberg, Nordrhein-Westfalen, Germany
    • Karlsruhe, Baden-Württemberg, Germany
    • Kiel, Schleswig-Holstein, Germany
    • Koln, Nordrhein-Westfalen, Germany
    • Leipzig, Sachsen, Germany
    • Leverkusen , Nordrhein-Westfalen, Germany
    • Lingen, Niedersachsen, Germany
    • Magdeburg, Sachsen-Anhalt, Germany
    • Mainz, Rheinland-Pfalz, Germany
    • Mittenwalde , Brandenburg, Germany
    • Munich, Bayern, Germany
    • Münster, Nordrhein-Westfalen, Germany
    • Münster, Nordrhein-Westfalen, Germany
    • Neu-Ulm, Bayern, Germany
    • Neuruppin, Brandenburg, Germany
    • Neuss, Nordrhein-Westfalen, Germany
    • Nuremberg, Bayern, Germany
    • Osnabrück, Niedersachsen, Germany
    • Paderborn, Nordrhein-Westfalen, Germany
    • Potsdam, Brandenburg, Germany
    • Quedlinburg, Sachsen-Anhalt, Germany
    • Rimbach, Hessen, Germany
    • Rostock, Mecklenburg-Vorpommern, Germany
    • Schwerin, Mecklenburg-Vorpommern, Germany
    • Stuttgart, Baden-Württemberg, Germany
    • Ulm, Baden-Württemberg, Germany
    • Varel, Niedersachsen, Germany
    • Viechtach, Bayern, Germany
    • Warendorf, Nordrhein-Westfalen, Germany
    • Wiesbaden, Hessen, Germany
    • Wuppertal, Nordrhein-Westfalen, Germany
    • Lisbon, Lisboa, Portugal
    • Porto, Lisboa, Portugal
    • Zagreb, Zagrebačka županija, Croatia
    • Zagreb, Grad Zagreb, Croatia
    • Sarajevo, Federacija Bosne i Hercegovine, Bosnia and Herzegovina
    +88 more
  • Engineering

- full time and remote job
- fluent English is needed

Job description

At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio. We are currently seeking an Senior AI DevOps / LLMOps specialist to join one of our clients' teams. If you're looking for an exciting opportunity to grow in a innovative environment, this could be the perfect fit for you.

Key Responsibilities

  1. Automation of Build-to-Production

- Design and implement robust CI/CD pipelines tailored for AI, covering model weights,

dataset versioning, and application code.

- Develop specialized workflows for PromptOps, ensuring that system prompts are

version-controlled, tested for regressions, and deployed with the same rigor as traditional

code.

-Automate the deployment of Agentic workflows, managing the complexities of stateful

AI interactions and multi-agent handoffs.

2. AI Infrastructure as Code (IaC)

- Provision and manage high-performance compute environments (GPU clusters, TPU

pods) using Terraform, Pulumi, or Ansible.

- Define and enforce Policy-as-Code for AI endpoints to ensure compliance with security,

cost-usage limits, and data residency requirements.

- Maintain a consistent environment across Hybrid Infrastructure, ensuring seamless

parity between On-Premises development and Cloud production.

3. Safe Experimentation & Controlled Releases

- Architect Progressive Delivery strategies for AI, including Canary releases, Blue-Green

deployments, and Shadowing (where new models run in parallel with production to

compare outputs).

- Build “Evaluation-in-the-Loop” gates within the pipeline to automatically test for bias,

hallucination, and performance degradation before a release.

- Implement A/B testing frameworks specifically designed for LLM outputs and agentic

behavior.

4. Monitoring & Observability

- Establish deep observability into Inference Endpoints, tracking metrics like tokens-per-

second, latency, and drift in model accuracy.

-Integrate feedback loops that capture production “edge cases” to feed back into the

training and fine-tuning pipelines.

Job requirements

Must-Have Technical Skills:

-Orchestration: Advanced Kubernetes (K8s) skills, specifically with KubeFlow, Ray, or

NVIDIA Triton.

-CI/CD & IaC: Expertise in GitHub Actions/GitLab CI, and Terraform or Pulumi.

- AI Tooling: Experience with Weights & Biases, MLflow, LangSmith, or Arize

Phoenix.

-Hardware: Understanding of GPU virtualization, CUDA drivers, and on-premises

hardware management.
-Security: Familiarity with Open Policy Agent (OPA) and secret management (Vault).

Experience:

- 10+ years in DevOps, SRE, or Cloud Engineering.

- 2+ years of hands-on experience in MLOps or LLMOps, specifically moving LLMs

from notebook to production.

-Proven experience managing Hybrid Cloud environments (e.g., AWS/Azure + Private

Data Center).

or