Role- AI Architect
Location- Brussels, Belgium
Experience level- 15+ Years
Job Description:
Architecture & Solution Design
* Define reference architectures for GenAI systems: RAG, agentic orchestration, tool/function calling, multi-step reasoning workflows, memory patterns, and context strategies.
* Design multi-tenant and enterprise-scale GenAI platforms with clear separation of concerns: UI, orchestration, retrieval, inference, evaluation, and observability.
* Select model strategies: hosted LLMs, open-weight models, fine-tuning vs. prompt/RAG, latency and cost tradeoffs, and deployment patterns.
2) Agentic AI Orchestration & Tooling
* Architect agent systems (single/multi-agent) including:
* Task decomposition, planners/executors, reflection/verification loops
* Tool use patterns (APIs, databases, search, workflow engines)
* Guardrails to prevent unsafe tool actions and hallucinated commands
* Build reliable flows for "human-in-the-loop" decision points and approvals (e.g., procurement, customer comms, incident triage).
3) Retrieval, Knowledge Systems & Data Design
* Lead design of knowledge ingestion pipelines:
* document parsing, chunking strategies, embeddings, metadata, lineage, freshness SLAs
* Architect vector search and hybrid retrieval:
* semantic + keyword, reranking, filtering, ACL-aware retrieval
* Ensure retrieval respects access control, PII handling, data residency, and auditability.
4) Production Engineering, Reliability & Cost
* Set non-functional requirements for GenAI workloads:
* SLOs, latency budgets, fallback models, caching, rate limiting
* Design cost controls: prompt/token optimization, model routing, batching, and usage governance.
* Implement resiliency patterns: circuit breakers, retries, queue-based orchestration, idempotency.
5) Security, Risk & Responsible AI
* Establish AI security posture:
* prompt injection defenses, data exfiltration controls, tool sandboxing
* Define policies and controls for:
* sensitive data, logging, redaction, encryption, secret management, and auditing
* Collaborate with risk/compliance to drive:
* model governance, content safety, bias/quality monitoring, and regulatory alignment
6) Evaluation, Observability & Continuous Improvement
* Create evaluation frameworks:
* offline evals (golden sets), automated regression, and scenario-based testing
* Instrument systems for observability:
* traces, prompt/versioning, retrieval diagnostics, tool-call logs, and outcome metrics
* Run A/B tests and iterate on prompts, retrieval, and agent policies based on measurable outcomes.
7) Leadership & Stakeholder Management
* Partner with product leaders to identify high-value use cases and define roadmap.
* Mentor engineers and data scientists on best practices for LLM apps.
* Produce architecture artifacts: ADRs, threat models, system diagrams, runbooks.
Required Skills & Experience
Core Technical Skills (Must Have)
* 8+ years in software/solution architecture with 2+ years delivering GenAI/LLM solutions in production (adjust as needed).
* Strong knowledge of LLMs: prompting patterns, context windows, tool/function calling, model limitations, and safety risks.
* Agentic AI design experience:
* orchestrators, workflows, multi-step reasoning, tool usage, HITL patterns
* RAG expertise:
* embeddings, vector DBs, hybrid retrieval, reranking, chunking strategies, evaluation
* Cloud architecture (Azure/AWS/GCP) with production engineering rigor:
* microservices, containers (Docker/K8s), serverless, CI/CD
* Solid programming skills (one or more):
* Python, TypeScript/JavaScript, Java, C#
* Experience with APIs and integration patterns:
* REST/gRPC, event-driven systems, queues, workflow engines
Security & Governance (Must Have)
* Understanding of GenAI-specific threats:
* prompt injection, data leakage, jailbreaks, insecure tool calling
* Familiarity with enterprise controls:
* IAM, key management, encryption, network isolation, audit logging
* Responsible AI practices:
* evaluation, content moderation, privacy, and compliance-by-design
Architecture & Systems Skills (Must Have)
* Distributed system design:
* scalability, fault tolerance, caching, performance tuning
* Observability:
* logging/metrics/tracing, prompt/version tracking, monitoring SLIs/SLOs
* Cost management and performance optimization:
* model selection/routing, token reduction, caching, batching
Preferred / Nice-to-Have Skills
* Fine-tuning approaches:
* LoRA/QLoRA, instruction tuning, adapters, distillation (when appropriate)
* Experience with:
* Knowledge graphs, semantic layers, enterprise search
* Advanced evaluation:
* LLM-as-judge with safeguards, rubric scoring, adversarial testing
* MLOps/LLMOps toolchains:
* experiment tracking, feature stores, model registries, data quality tools
* Domain experience:
* customer support automation, developer productivity copilots, IT ops agents, finance or healthcare compliance
* Experience building platforms:
* reusable agent frameworks, reusable RAG components, multi-team enablement
For more information on how we process your personal data, please refer to HCLTech's Candidate Data Privacy Notice.