Emploi
Mes offres
Mes alertes emploi
Se connecter
Trouver un emploi Astuces emploi Fiches entreprises
Chercher

Devops /platform engineer (4631)

Keep Simple
Publiée le 3 juin
Description de l'offre

Descrição da vaga

Global insurance and asset management company seeks a responsible, organized, dynamic and team-oriented person.


Responsabilidades e atribuições

Role Summary

We are seeking a Senior DevOps / Platform Engineer to design, build, and operate the cloud infrastructure, CI/CD pipelines, and developer platform that underpin our AI and digital innovation initiatives. This is a cloud-agnostic role — you will architect infrastructure and platform capabilities that work across AWS, Azure, and GCP, ensuring our engineering teams can build, deploy, and operate AI-powered applications with speed, security, and reliability.

A distinguishing aspect of this role is the MLOps dimension. You will build and maintain the infrastructure for AI/ML model lifecycle management: training environments, model serving, experiment tracking, automated evaluation, and production monitoring. You will ensure that deploying an AI model to production is as reliable, repeatable, and observable as deploying a traditional software service.

Key Responsibilities

CI/CD Pipeline Engineering

* Design and maintain end-to-end CI/CD pipelines for all engineering workstreams: application code, infrastructure-as-code, AI/ML models, data pipelines, and automation scripts;
* Build multi-stage deployment pipelines with automated testing gates: unit tests, integration tests, security scans (SAST/DAST/SCA), AI model evaluation, and infrastructure validation;
* Implement deployment strategies: blue/green, canary, rolling updates, and feature flags — for both traditional services and AI model endpoints;
* Design and maintain artifact management: container registries, model registries, package repositories, and versioned infrastructure modules;
* Build pipeline observability: deployment frequency tracking, lead time for changes, change failure rate, and mean time to recovery (DORA metrics);
* Implement GitOps workflows using ArgoCD, Flux, or equivalent for declarative infrastructure and application deployment.

Cloud Infrastructure (Cloud-Agnostic)

* Design and maintain cloud infrastructure across AWS, Azure, and/or GCP — with emphasis on portability and avoiding deep vendor lock-in where practical;
* Implement infrastructure-as-code using Terraform (primary), Pulumi, or CloudFormation/Bicep with modular, reusable, and well-tested infrastructure modules;
* Design and operate Kubernetes clusters (EKS, AKS, GKE) for containerized workloads — including AI model serving, API services, and batch processing;
* Build and manage serverless compute infrastructure (Lambda, Azure Functions, Cloud Functions) for event-driven workflows and lightweight AI inference;
* Implement cloud cost optimization: right-sizing, reserved capacity planning, spot/preemptible instance strategies, and automated cost monitoring and alerting;
* Design multi-environment strategies: development, staging, production — with proper isolation, data governance, and promotion workflows.

Security & Compliance Infrastructure

* Implement security-as-code: infrastructure security policies (Checkov, tfsec, Sentinel), container image scanning (Trivy, Snyk), and runtime security monitoring;
* Design and enforce zero-trust networking: service mesh (Istio, Linkerd), network policies, private endpoints, and API gateway security;
* Implement secrets management using HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or equivalent;
* Build and maintain identity and access management: service accounts, workload identity, least-privilege IAM policies, and RBAC for Kubernetes and cloud resources;
* Ensure infrastructure compliance with SOC 2, ISO 27001, GDPR, and industry-specific regulations;
* Implement audit logging, security alerting, and automated compliance scanning across all infrastructure.

MLOps & AI Infrastructure

* Design and build ML training infrastructure: GPU/TPU compute provisioning, distributed training support, and experiment tracking (MLflow, Weights & Biases);
* Build model serving infrastructure: containerized model endpoints, auto-scaling (including GPU-based scaling), A/B testing, and model routing;
* Implement model registry and lifecycle management: model versioning, staging, approval workflows, and automated deployment pipelines;
* Build AI-specific monitoring: model latency, throughput, error rates, input/output drift detection, and token usage cost tracking;
* Design and operate vector database infrastructure for RAG systems: deployment, scaling, backup, and disaster recovery;
* Implement LLM gateway/proxy infrastructure: centralized API routing, rate limiting, cost controls, caching, and provider failover.

Reliability & Observability

* Design and implement comprehensive observability stack: metrics (Prometheus/Grafana, Datadog), logs (ELK, Loki, CloudWatch), traces (Jaeger, OpenTelemetry), and AI-specific monitoring;
* Build and maintain alerting systems with proper escalation policies, runbooks, and automated remediation where possible;
* Implement SLI/SLO frameworks for all production services — including AI model endpoints — with error budget tracking;
* Design disaster recovery and business continuity plans: multi-region deployment, data replication, backup strategies, and failover testing;
* Build chaos engineering practices: fault injection, game days, and resilience testing for both infrastructure and AI systems;
* Maintain incident management processes: on-call rotations, incident response playbooks, and post-incident review facilitation.

Developer Experience & Platform

* Build and maintain an Internal Developer Platform (IDP) that enables self-service infrastructure provisioning, environment management, and deployment;
* Design developer workflows: local development environments (dev containers, Codespaces), preview environments, and rapid feedback loops;
* Build and maintain developer documentation: architecture decision records (ADRs), runbooks, onboarding guides, and platform usage guidelines;
* Implement platform abstractions that reduce cognitive load on application developers while maintaining flexibility for power users;
* Design and operate shared services: database provisioning, cache infrastructure, message queue clusters, and monitoring stack.


Requisitos e qualificações

Required Qualifications / Skills

* 6+ years of experience in DevOps, SRE, or platform engineering, with at least 2+ years supporting AI/ML workloads in production;
* Expert-level experience with infrastructure-as-code: Terraform (primary), with exposure to Pulumi, CloudFormation, or Bicep;
* Production experience with Kubernetes (EKS, AKS, or GKE): cluster management, Helm charts, operators, auto-scaling, and troubleshooting;
* Deep experience with CI/CD pipeline design: GitHub Actions, GitLab CI, Azure DevOps Pipelines, or Jenkins — including multi-stage pipelines with automated quality gates;
* Strong cloud infrastructure experience across at least two of: AWS, Azure, GCP — with hands‑on skills in networking, compute, storage, identity, and security services;
* Proficiency in scripting and automation: Python, Bash, PowerShell, and at least one of: Go, TypeScript;
* Experience building observability stacks: Prometheus, Grafana, Datadog, ELK, OpenTelemetry, and alerting/on-call systems (PagerDuty, Opsgenie);
* Strong understanding of security engineering: secrets management, network security, IAM, container security, and compliance automation;
* Experience with GitOps practices and tools: ArgoCD, Flux, or equivalent;
* Fluent English, both written and spoken;
* Proven experience in international projects, including collaboration with global and multicultural teams;
* Strong communication, stakeholder management, and problem‑solving skills;
* Previous experience mentoring engineers or acting as a technical lead is strongly preferred.

Preferred Qualifications

* Hands‑on MLOps experience: model serving (vLLM, TensorRT, Triton Inference Server, SageMaker Endpoints, Azure ML), model registries (MLflow, Weights & Biases), and GPU infrastructure management;
* Experience building LLM gateway/proxy infrastructure: LiteLLM, AI Gateway, or custom routing layers;
* Familiarity with platform engineering tools: Backstage, Port, Humanitec, or custom developer portals;
* Experience with service mesh technologies: Istio, Linkerd, or Consul Connect;
* Knowledge of FinOps practices: cloud cost management, tagging strategies, showback/chargeback models;
* Experience in insurance, financial services, or other regulated industries with strict compliance requirements;
* Certifications: CKA/CKAD (Kubernetes), AWS Solutions Architect / DevOps Engineer, Azure DevOps Engineer Expert, HashiCorp Terraform Associate;
* Experience with chaos engineering tools: Chaos Monkey, Litmus, Gremlin;
* Familiarity with edge/hybrid deployment patterns for AI models;
* Experience building and operating data platform infrastructure: Spark clusters, Kafka, Airflow/Prefect deployments.

Base Requirements

* DevOps Experience | All team members must demonstrate hands‑on experience with CI/CD pipelines, containerization (Docker/Kubernetes), cloud platforms, and deployment automation;
* Infrastructure as Code | Proficiency with at least one IaC toolchain (Terraform, Pulumi, CloudFormation/Bicep) is required across all roles — not just DevOps;
* Cloud Platforms | Working knowledge of at least one major cloud provider (AWS, Azure, or GCP);
* Version Control & Collaboration | Git-based workflows, code review practices, and collaborative development are expected of every team member.

Education

* Bachelor's degree in Computer Science, Information Systems, Engineering, or a related field is preferred.


Informações adicionais

Modelo de contratação:

* PJ

Forma de atuação:

* 100% Remoto
#J-18808-Ljbffr

Postuler
Créer une alerte
Alerte activée
Sauvegardée
Sauvegarder
Offres similaires
Accueil > Emploi > Devops /Platform Engineer (4631)

Jobijoba

  • Dossiers emploi
  • Avis Entreprise

Trouvez des offres

  • Offres d'emploi par métier
  • Recherche d'emploi par secteur
  • Emplois par sociétés
  • Emploi par localité

Contact / Partenariats

  • Contact
  • Publiez vos offres sur Jobijoba

Mentions légales - Conditions générales d'utilisation - Politique de confidentialité - Gérer mes cookies - Accessibilité : Non conforme

© 2026 Jobijoba - Tous Droits Réservés

Postuler
Créer une alerte
Alerte activée
Sauvegardée
Sauvegarder