Role- AI Architect
Location- Brussels, Belgium
Experience level- 15+ Years
Job Description:
Architecture & Solution Design Define reference architectures for GenAI systems: RAG, agentic orchestration, tool/function calling, multi-step reasoning workflows, memory patterns, and context strategies. Design multi-tenant and enterprise-scale GenAI platforms with clear separation of concerns: UI, orchestration, retrieval, inference, evaluation, and observability. Select model strategies: hosted LLMs, open-weight models, fine-tuning vs. prompt/RAG, latency and cost tradeoffs, and deployment patterns. 2) Agentic AI Orchestration & Tooling Architect agent systems (single/multi-agent) including: Task decomposition, planners/executors, reflection/verification loops Tool use patterns (APIs, databases, search, workflow engines) Guardrails to prevent unsafe tool actions and hallucinated commands Build reliable flows for “human-in-the-loop” decision points and approvals (e.g., procurement, customer comms, incident triage). 3) Retrieval, Knowledge Systems & Data Design Lead design of knowledge ingestion pipelines: document parsing, chunking strategies, embeddings, metadata, lineage, freshness SLAs Architect vector search and hybrid retrieval: semantic + keyword, reranking, filtering, ACL-aware retrieval Ensure retrieval respects access control, PII handling, data residency, and auditability. 4) Production Engineering, Reliability & Cost Set non-functional requirements for GenAI workloads: SLOs, latency budgets, fallback models, caching, rate limiting Design cost controls: prompt/token optimization, model routing, batching, and usage governance. Implement resiliency patterns: circuit breakers, retries, queue-based orchestration, idempotency. 5) Security, Risk & Responsible AI Establish AI security posture: prompt injection defenses, data exfiltration controls, tool sandboxing Define policies and controls for: sensitive data, logging, redaction, encryption, secret management, and auditing Collaborate with risk/compliance to drive: model governance, content safety, bias/quality monitoring, and regulatory alignment 6) Evaluation, Observability & Continuous Improvement Create evaluation frameworks: offline evals (golden sets), automated regression, and scenario-based testing Instrument systems for observability: traces, prompt/versioning, retrieval diagnostics, tool-call logs, and outcome metrics Run A/B tests and iterate on prompts, retrieval, and agent policies based on measurable outcomes. 7) Leadership & Stakeholder Management Partner with product leaders to identify high-value use cases and define roadmap. Mentor engineers and data scientists on best practices for LLM apps. Produce architecture artifacts: ADRs, threat models, system diagrams, runbooks.
Required Skills & Experience Core Technical Skills (Must Have) 8+ years in software/solution architecture with 2+ years delivering GenAI/LLM solutions in production (adjust as needed). Strong knowledge of LLMs: prompting patterns, context windows, tool/function calling, model limitations, and safety risks. Agentic AI
design experience: orchestrators, workflows, multi-step reasoning, tool usage, HITL patterns RAG expertise : embeddings, vector DBs, hybrid retrieval, reranking, chunking strategies, evaluation Cloud architecture
(Azure/AWS/GCP) with production engineering rigor: microservices, containers (Docker/K8s), serverless, CI/CD Solid programming skills (one or more): Python, TypeScript/JavaScript, Java, C# Experience with APIs and integration patterns: REST/gRPC, event-driven systems, queues, workflow engines Security & Governance (Must Have) Understanding of GenAI-specific threats: prompt injection, data leakage, jailbreaks, insecure tool calling Familiarity with enterprise controls: IAM, key management, encryption, network isolation, audit logging Responsible AI practices: evaluation, content moderation, privacy, and compliance-by-design Architecture & Systems Skills (Must Have) Distributed system design: scalability, fault tolerance, caching, performance tuning Observability: logging/metrics/tracing, prompt/version tracking, monitoring SLIs/SLOs Cost management and performance optimization: model selection/routing, token reduction, caching, batching
Preferred / Nice-to-Have Skills Fine-tuning approaches: LoRA/QLoRA, instruction tuning, adapters, distillation (when appropriate) Experience with: Knowledge graphs, semantic layers, enterprise search Advanced evaluation: LLM-as-judge with safeguards, rubric scoring, adversarial testing MLOps/LLMOps toolchains: experiment tracking, feature stores, model registries, data quality tools Domain experience: customer support automation, developer productivity copilots, IT ops agents, finance or healthcare compliance Experience building platforms: reusable agent frameworks, reusable RAG components, multi-team enablement
For more information on how we process your personal data, please refer to HCLTech’s
Candidate Data Privacy Notice .