Overview

We are looking for a hands-on **Generative AI Technical Architect** who will own the end-to-end architecture of enterprise-scale, knowledge-intensive, agentic AI systems. This is a high-impact role focused on building production-grade Retrieval-Augmented Generation (RAG), Corrective/Controllable-Augmented Generation (CAG), multi-agent frameworks, long-term memory systems, NL2SQL engines, and Small Language Model (SLM)-powered edge/agent deployments using modern ecosystems (LangChain, LlamaIndex, CrewAI, AutoGen, Haystack, DSPy, etc.).

Responsibilities

Architect and own the enterprise GenAI platform with advanced RAG/CAG pipelines (hybrid search, re-ranking, query rewriting, hypothetical document embeddings (HyDE), parent-child retrieval, knowledge graph + vector fusion).
Design and scale multi-agent / agentic workflows (reasoning + acting, tool use, multi-agent collaboration, hierarchical agents, long-running agents with persistence).
Build production-grade long-term and short-term memory systems (vector stores with metadata filtering, session summarization, entity memory, reflection/memory consolidation).
Lead architecture of enterprise Knowledge Bases (ingestion pipelines, chunking strategies, metadata enrichment, incremental updates, multi-tenant KB isolation).
Own NL2SQL / Text-to-SQL architecture (schema linking, few-shot prompting, self-correction, execution feedback loops, SQL guardrails, multi-database support).
Design and deploy Small Language Models (SLM) for on-device, low-latency, or cost-sensitive agent use cases (Phi-3, Gemma-2B, Mistral-7B, Llama-3.1-8B quantized, TinyLlama, MobileBERT variants).
Define the standard GenAI framework stack (LangChain / LlamaIndex / LangGraph / CrewAI / AutoGen / Microsoft Semantic Kernel / Haystack / DSPy) and create internal libraries/SDKS for the entire organization.
Build observability, tracing, and evaluation frameworks for RAG (RAGAS, TruLens, DeepEval), agents (AgentOps), and NL2SQL accuracy.
Establish governance: prompt injection defense, output sanitization, PII redaction, citation verification, hallucination detection, and enterprise guardrails.
Performance engineering: latency optimization (speculative decoding, caching, batching, query routing), cost optimization (SLM routing, fallback strategies), and multi-region deployment.
Drive GenAI platform roadmap, conduct architecture reviews, and mentor senior engineers building RAG/agent products.

Requirements

1+ years building and shipping production RAG/CAG systems used by 100K+ daily active users.
Deep expertise in modern retrieval techniques: dense (ColBERT, Splade, bge, e5), sparse (BM25, SPLADE), hybrid, re-ranking (cross-encoders, Cohere Rerank, bge-reranker), sentence transformers, and late interaction models.
Proven track record designing and scaling agentic systems with tool calling, planning (ReAct, Plan-and-Execute, Reflexion), and multi-agent orchestration.
Hands-on experience with vector databases at scale (Pinecone, Weaviate, Milvus, Zilliz, Qdrant, PGVector, Redis, Vespa, Elasticsearch with vector support).
Expert in LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and DSPy — including custom node creation, memory modules, and production deployment patterns.
Production NL2SQL systems (accuracy >92% on Spider/BIRD benchmarks in real enterprise schemas).
Deployed SLMs in production (quantized 4-bit/8-bit, ONNX/TensorRT-LLM export, edge deployment).
Strong Python, async frameworks, FastAPI, graph databases (Neo4j, FalkorDB for knowledge graphs), and Kubernetes.

### Preferred (Significant Advantage)

Previously defined the GenAI/RAG/agent stack for a unicorn or large enterprise (Jasper, Glean, Adept, Cresta, Moveworks, Salesforce Einstein, Microsoft Copilot team, etc.).
Contributions to LangChain, LlamaIndex, Haystack, or RAGAS open-source repositories.
Built enterprise knowledge bases processing 10M+ documents with sub-second retrieval latency.
Experience with controllable generation (CAG), guided generation (Outlines, Guidance, LMQL), and structured output enforcement.

If you have architected and shipped multiple enterprise RAG + Agent + NL2SQL + Memory systems that are live in production today, and you live and breathe LangChain/LlamaIndex every day — this is your role.

Architect and own the enterprise GenAI platform with advanced RAG/CAG pipelines (hybrid search, re-ranking, query rewriting, hypothetical document embeddings (HyDE), parent-child retrieval, knowledge graph + vector fusion).
Design and scale multi-agent / agentic workflows (reasoning + acting, tool use, multi-agent collaboration, hierarchical agents, long-running agents with persistence).
Build production-grade long-term and short-term memory systems (vector stores with metadata filtering, session summarization, entity memory, reflection/memory consolidation).
Lead architecture of enterprise Knowledge Bases (ingestion pipelines, chunking strategies, metadata enrichment, incremental updates, multi-tenant KB isolation).
Own NL2SQL / Text-to-SQL architecture (schema linking, few-shot prompting, self-correction, execution feedback loops, SQL guardrails, multi-database support).
Design and deploy Small Language Models (SLM) for on-device, low-latency, or cost-sensitive agent use cases (Phi-3, Gemma-2B, Mistral-7B, Llama-3.1-8B quantized, TinyLlama, MobileBERT variants).
Define the standard GenAI framework stack (LangChain / LlamaIndex / LangGraph / CrewAI / AutoGen / Microsoft Semantic Kernel / Haystack / DSPy) and create internal libraries/SDKS for the entire organization.
Build observability, tracing, and evaluation frameworks for RAG (RAGAS, TruLens, DeepEval), agents (AgentOps), and NL2SQL accuracy.
Establish governance: prompt injection defense, output sanitization, PII redaction, citation verification, hallucination detection, and enterprise guardrails.
Performance engineering: latency optimization (speculative decoding, caching, batching, query routing), cost optimization (SLM routing, fallback strategies), and multi-region deployment.
Drive GenAI platform roadmap, conduct architecture reviews, and mentor senior engineers building RAG/agent products.
1+ years building and shipping production RAG/CAG systems used by 100K+ daily active users.
Deep expertise in modern retrieval techniques: dense (ColBERT, Splade, bge, e5), sparse (BM25, SPLADE), hybrid, re-ranking (cross-encoders, Cohere Rerank, bge-reranker), sentence transformers, and late interaction models.
Proven track record designing and scaling agentic systems with tool calling, planning (ReAct, Plan-and-Execute, Reflexion), and multi-agent orchestration.
Hands-on experience with vector databases at scale (Pinecone, Weaviate, Milvus, Zilliz, Qdrant, PGVector, Redis, Vespa, Elasticsearch with vector support).
Expert in LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and DSPy - including custom node creation, memory modules, and production deployment patterns.
Production NL2SQL systems (accuracy >92% on Spider/BIRD benchmarks in real enterprise schemas).
Deployed SLMs in production (quantized 4-bit/8-bit, ONNX/TensorRT-LLM export, edge deployment).
Strong Python, async frameworks, FastAPI, graph databases (Neo4j, FalkorDB for knowledge graphs), and Kubernetes.

### Preferred (Significant Advantage)

Previously defined the GenAI/RAG/agent stack for a unicorn or large enterprise (Jasper, Glean, Adept, Cresta, Moveworks, Salesforce Einstein, Microsoft Copilot team, etc.).
Contributions to LangChain, LlamaIndex, Haystack, or RAGAS open-source repositories.
Built enterprise knowledge bases processing 10M+ documents with sub-second retrieval latency.
Experience with controllable generation (CAG), guided generation (Outlines, Guidance, LMQL), and structured output enforcement.

Generative AI Technical Architect

Job description

Explore more

Career resources

Similar jobs

AI Principal Engineer

Lead - Cybersecurity Third-Party Risk Management

Director – Business Process Excellence & Audit Governance

Recruitment Specialist

Software Engineer (Wifi/ Embedded)

Software Engineer (Wifi/ Embedded)

Career resources

Similar jobs

AI Principal Engineer

Lead - Cybersecurity Third-Party Risk Management

Director – Business Process Excellence & Audit Governance

Recruitment Specialist

Software Engineer (Wifi/ Embedded)

Software Engineer (Wifi/ Embedded)