About us

Articul8 was born from a simple belief: GenAI should work for the enterprise, not the other way around. Our platform combines domain-specific models, autonomous agentic reasoning through ModelMesh(TM), reliable model evaluation through LLM-IQ(TM), and multimodal understanding to serve regulated industries including energy, semiconductor, finance, aerospace, and supply chain. Trusted by Fortune 500 enterprises, we bring together research, engineering, product, and domain expertise to deliver AI that meets the accuracy, explainability, and auditability standards that high-stakes environments demand.

Job Description

Articul8 AI is seeking a Staff Applied AI Researcher to define how our platform reasons at runtime and how autonomous systems make trustworthy decisions in production. You will lead research across the core runtime intelligence capabilities behind ModelMesh(TM): task decomposition, agent coordination, model and tool routing, probabilistic decisioning, verification, observability-aware execution, and the evaluation methods that determine whether autonomous behavior is reliable enough for enterprise use.

Responsibilities

Set technical direction for agentic reasoning systems and runtime intelligence across ModelMesh™ — define the orchestration strategies, decision policies, verification approaches, and runtime quality standards that determine how massively parallel agent systems reason, coordinate, and self-correct in production
Architect the infrastructure for researcher augmentation at scale — design the agentic platforms and orchestration primitives that enable every researcher and engineer at Articul8 to deploy fleets of AI agents for experimentation, evaluation, and production integration — multiplying the depth, breadth, and velocity of the entire organization
Go deep: advance the science of autonomous reasoning — design, train, and refine the learned components behind runtime decisioning (routing models, verification models, confidence estimators, reward models, policy selectors), using massively parallel agent-driven experiment pipelines to explore architectural and algorithmic frontiers exhaustively
Go broad: unify perception, retrieval, reasoning, and action — build repeatable methodology for composing domain-specific models, data perception systems, knowledge graphs, retrieval layers, and external tools into coherent agentic workflows, delegating integration testing and cross-modal benchmarking to parallel agent systems so you can reason across the full stack simultaneously
Drive research on agent reliability for regulated environments — lead failure detection, self-checking, verification workflows, compounding error analysis, and auditable autonomous behavior research, using agent-orchestrated stress testing and red-teaming at scales that manual evaluation cannot reach
Define evaluation methodology for runtime intelligence — establish how task success, decision quality, robustness, traceability, and failure recovery are measured under realistic enterprise conditions, building agentic evaluation harnesses that run continuously and surface regressions before they reach customers
Influence platform-level architecture — shape decisions on model routing, tool use, observability, governance, access control, and interoperability with external agent ecosystems, ensuring the platform is designed for humans and agents to amplify each other
Mentor researchers across levels in the agentic paradigm — raise the bar on technical judgment, experimental rigor, and agent-augmented research practice; contribute to hiring researchers who are driven to maximize their human potential
Maintain hands-on research impact — sustain a meaningful personal research contribution through technical work, publications, patents, and externally visible output, modeling what it looks like to be a deeply technical leader who uses agentic systems to go deeper and faster than ever before

Required Qualifications

Education: PhD or MSc in Computer Science, Machine Learning, AI, Robotics, or a related field.
Experience: 8+ years in AI/ML research with demonstrated impact on production systems, including 3+ years building LLM-based or autonomous AI systems.
Reasoning and orchestration: Deep hands-on experience in at least two of: multi-agent coordination, planning under uncertainty, sequential decision-making, probabilistic inference, model routing, or tool-using agent systems. You've built systems where multiple models must collaborate to produce a reliable outcome.
Evaluation of autonomous systems: You have designed evaluation frameworks for systems where correctness is not binary — measuring decision quality, reliability under distribution shift, compounding error rates, and failure recovery in production-like conditions.
Systems at scale: You have designed and operated research systems that integrate multiple models, data sources, and control mechanisms in production or near-production settings. You understand the difference between a demo and a system.
Software engineering: Proficient in Python with strong software architecture instincts. Your systems are maintainable, testable, and operable.
Technical leadership: You have set technical direction for a research area, mentored researchers, and influenced quality standards beyond your immediate team.

Preferred Qualifications

Experience building orchestration systems with non-trivial control flow — dynamic routing, verification loops, probabilistic gating — not just prompt chaining or fixed DAGs.
Background in probabilistic modeling, Bayesian inference, control theory, or formal verification applied to ML systems — you can reason about uncertainty, not just measure it.
Experience with reliability engineering for autonomous AI in regulated environments: observability, safety constraints, graceful degradation, and audit trails.
Track record of integrating heterogeneous components (retrieval, knowledge graphs, domain models, external APIs) into systems that are more reliable than their individual parts.
Strong publication record with evidence of sustained, focused research impact — not just breadth.
Experience taking reasoning or agent systems from prototype to production serving real enterprise customers.
Domain familiarity in energy, semiconductor, finance, aerospace, telecom, or supply chain.

Professional Attributes (Code42)

Practice Humility: You recognize that setting technical direction is a responsibility, not a status. You change your mind publicly when the evidence demands it and build a team culture where the best idea wins regardless of who proposed it.
Bias for Outcomes: You define success by customer and platform impact, not research novelty alone. You make hard prioritization calls and hold yourself accountable for whether the team's work moved the needle.
Care Deeply: You take personal responsibility for the reliability and trustworthiness of the systems your team builds. You invest in the people around you — their growth, their clarity, their ability to do their best work — because that's how real quality is sustained.
Dare to Do the Impossible & Embrace Scarcity: You take on problems that don't have known solutions and structure them into tractable research programs. You don't wait for perfect resources — you build with what you have and make the case for what you need with results, not requests.
Build a Better World: You ensure that the autonomous systems you build are worthy of the trust enterprises place in them. You hold the team to standards of auditability, reliability, and fairness that go beyond what's required — because you believe the bar should be set by builders, not regulators.

Staff Applied AI Researcher - Agentic Reasoning Systems (Brazil)

Job description

Explore more

Career resources

Similar jobs

Principal Applied AI Researcher - Domain- Specific Models (Dublin, CA)

Staff Applied AI Researcher - Agentic Reasoning Systems (India)

Staff Applied AI Researcher - Agentic Reasoning Systems (Dublin, CA)

Senior Applied AI Researcher (Dublin, CA)

Applied AI Researcher (Dublin, CA)

Business Development Representative - Fluent/Native French Speaking