MCPNew: Mokaru MCP server is live
prodapt

prodapt

Generative AI Technical Architect

Company

prodapt

Role

Generative AI Technical Architect

Job type

-

Found on Mokaru

16 hours ago

Share this job

Salary

Not disclosed by employer

Job description

Overview

We are looking for a hands-on **Generative AI Technical Architect** who will own the end-to-end architecture of enterprise-scale, knowledge-intensive, agentic AI systems. This is a high-impact role focused on building production-grade Retrieval-Augmented Generation (RAG), Corrective/Controllable-Augmented Generation (CAG), multi-agent frameworks, long-term memory systems, NL2SQL engines, and Small Language Model (SLM)-powered edge/agent deployments using modern ecosystems (LangChain, LlamaIndex, CrewAI, AutoGen, Haystack, DSPy, etc.).

Responsibilities

  • Architect and own the enterprise GenAI platform with advanced RAG/CAG pipelines (hybrid search, re-ranking, query rewriting, hypothetical document embeddings (HyDE), parent-child retrieval, knowledge graph + vector fusion).
  • Design and scale multi-agent / agentic workflows (reasoning + acting, tool use, multi-agent collaboration, hierarchical agents, long-running agents with persistence).
  • Build production-grade long-term and short-term memory systems (vector stores with metadata filtering, session summarization, entity memory, reflection/memory consolidation).
  • Lead architecture of enterprise Knowledge Bases (ingestion pipelines, chunking strategies, metadata enrichment, incremental updates, multi-tenant KB isolation).
  • Own NL2SQL / Text-to-SQL architecture (schema linking, few-shot prompting, self-correction, execution feedback loops, SQL guardrails, multi-database support).
  • Design and deploy Small Language Models (SLM) for on-device, low-latency, or cost-sensitive agent use cases (Phi-3, Gemma-2B, Mistral-7B, Llama-3.1-8B quantized, TinyLlama, MobileBERT variants).
  • Define the standard GenAI framework stack (LangChain / LlamaIndex / LangGraph / CrewAI / AutoGen / Microsoft Semantic Kernel / Haystack / DSPy) and create internal libraries/SDKS for the entire organization.
  • Build observability, tracing, and evaluation frameworks for RAG (RAGAS, TruLens, DeepEval), agents (AgentOps), and NL2SQL accuracy.
  • Establish governance: prompt injection defense, output sanitization, PII redaction, citation verification, hallucination detection, and enterprise guardrails.
  • Performance engineering: latency optimization (speculative decoding, caching, batching, query routing), cost optimization (SLM routing, fallback strategies), and multi-region deployment.
  • Drive GenAI platform roadmap, conduct architecture reviews, and mentor senior engineers building RAG/agent products.

Requirements

  • 1+ years building and shipping production RAG/CAG systems used by 100K+ daily active users.
  • Deep expertise in modern retrieval techniques: dense (ColBERT, Splade, bge, e5), sparse (BM25, SPLADE), hybrid, re-ranking (cross-encoders, Cohere Rerank, bge-reranker), sentence transformers, and late interaction models.
  • Proven track record designing and scaling agentic systems with tool calling, planning (ReAct, Plan-and-Execute, Reflexion), and multi-agent orchestration.
  • Hands-on experience with vector databases at scale (Pinecone, Weaviate, Milvus, Zilliz, Qdrant, PGVector, Redis, Vespa, Elasticsearch with vector support).
  • Expert in LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and DSPy — including custom node creation, memory modules, and production deployment patterns.
  • Production NL2SQL systems (accuracy >92% on Spider/BIRD benchmarks in real enterprise schemas).
  • Deployed SLMs in production (quantized 4-bit/8-bit, ONNX/TensorRT-LLM export, edge deployment).
  • Strong Python, async frameworks, FastAPI, graph databases (Neo4j, FalkorDB for knowledge graphs), and Kubernetes.

### Preferred (Significant Advantage)

  • Previously defined the GenAI/RAG/agent stack for a unicorn or large enterprise (Jasper, Glean, Adept, Cresta, Moveworks, Salesforce Einstein, Microsoft Copilot team, etc.).
  • Contributions to LangChain, LlamaIndex, Haystack, or RAGAS open-source repositories.
  • Built enterprise knowledge bases processing 10M+ documents with sub-second retrieval latency.
  • Experience with controllable generation (CAG), guided generation (Outlines, Guidance, LMQL), and structured output enforcement.

If you have architected and shipped multiple enterprise RAG + Agent + NL2SQL + Memory systems that are live in production today, and you live and breathe LangChain/LlamaIndex every day — this is your role.

  • Architect and own the enterprise GenAI platform with advanced RAG/CAG pipelines (hybrid search, re-ranking, query rewriting, hypothetical document embeddings (HyDE), parent-child retrieval, knowledge graph + vector fusion).
  • Design and scale multi-agent / agentic workflows (reasoning + acting, tool use, multi-agent collaboration, hierarchical agents, long-running agents with persistence).
  • Build production-grade long-term and short-term memory systems (vector stores with metadata filtering, session summarization, entity memory, reflection/memory consolidation).
  • Lead architecture of enterprise Knowledge Bases (ingestion pipelines, chunking strategies, metadata enrichment, incremental updates, multi-tenant KB isolation).
  • Own NL2SQL / Text-to-SQL architecture (schema linking, few-shot prompting, self-correction, execution feedback loops, SQL guardrails, multi-database support).
  • Design and deploy Small Language Models (SLM) for on-device, low-latency, or cost-sensitive agent use cases (Phi-3, Gemma-2B, Mistral-7B, Llama-3.1-8B quantized, TinyLlama, MobileBERT variants).
  • Define the standard GenAI framework stack (LangChain / LlamaIndex / LangGraph / CrewAI / AutoGen / Microsoft Semantic Kernel / Haystack / DSPy) and create internal libraries/SDKS for the entire organization.
  • Build observability, tracing, and evaluation frameworks for RAG (RAGAS, TruLens, DeepEval), agents (AgentOps), and NL2SQL accuracy.
  • Establish governance: prompt injection defense, output sanitization, PII redaction, citation verification, hallucination detection, and enterprise guardrails.
  • Performance engineering: latency optimization (speculative decoding, caching, batching, query routing), cost optimization (SLM routing, fallback strategies), and multi-region deployment.
  • Drive GenAI platform roadmap, conduct architecture reviews, and mentor senior engineers building RAG/agent products.
  • 1+ years building and shipping production RAG/CAG systems used by 100K+ daily active users.
  • Deep expertise in modern retrieval techniques: dense (ColBERT, Splade, bge, e5), sparse (BM25, SPLADE), hybrid, re-ranking (cross-encoders, Cohere Rerank, bge-reranker), sentence transformers, and late interaction models.
  • Proven track record designing and scaling agentic systems with tool calling, planning (ReAct, Plan-and-Execute, Reflexion), and multi-agent orchestration.
  • Hands-on experience with vector databases at scale (Pinecone, Weaviate, Milvus, Zilliz, Qdrant, PGVector, Redis, Vespa, Elasticsearch with vector support).
  • Expert in LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and DSPy - including custom node creation, memory modules, and production deployment patterns.
  • Production NL2SQL systems (accuracy >92% on Spider/BIRD benchmarks in real enterprise schemas).
  • Deployed SLMs in production (quantized 4-bit/8-bit, ONNX/TensorRT-LLM export, edge deployment).
  • Strong Python, async frameworks, FastAPI, graph databases (Neo4j, FalkorDB for knowledge graphs), and Kubernetes.

### Preferred (Significant Advantage)

  • Previously defined the GenAI/RAG/agent stack for a unicorn or large enterprise (Jasper, Glean, Adept, Cresta, Moveworks, Salesforce Einstein, Microsoft Copilot team, etc.).
  • Contributions to LangChain, LlamaIndex, Haystack, or RAGAS open-source repositories.
  • Built enterprise knowledge bases processing 10M+ documents with sub-second retrieval latency.
  • Experience with controllable generation (CAG), guided generation (Outlines, Guidance, LMQL), and structured output enforcement.

If you have architected and shipped multiple enterprise RAG + Agent + NL2SQL + Memory systems that are live in production today, and you live and breathe LangChain/LlamaIndex every day - this is your role.

Resume ExampleCover Letter Example

Explore more