AI Evaluation Engineer

Location: Gurugram, India

Seniority: Mid-level (3-6 years)

Purpose: Build and execute FNZ's AI evaluation framework and tooling, working under the guidance of the AI Evaluations Team Lead to assess quality, safety, robustness, and operational suitability of AI solutions before release.

Key Responsibilities:

Build, maintain, and evolve core components of FNZ's AI evaluation framework, including test structures, scoring approaches, reusable evaluation patterns, and supporting documentation.
Develop and improve evaluation tooling, harnesses, automation, and CI/CD integrations used to run repeatable assessments across AI agents and workflows.
Execute evaluations across FNZ's six-pillar framework, including Task Performance, Safety & Compliance, Efficiency, Groundedness & Reasoning, Robustness, and Suitability.
Create and maintain test datasets, golden sets, rubrics, and scoring criteria that reflect expected agent behaviour and business requirements.
Build and run test suites covering baseline behaviour, edge cases, failure modes, and adversarial scenarios for AI agents and workflows.
Assess multi-step agentic workflows, including planning, tool use, execution quality, recovery from errors, and adherence to controls
Verify groundedness and output quality by checking responses against source content, expected reasoning patterns, and policy constraints.
Document evaluation findings with clear evidence, communicate issues to AI solution teams, and support remediation and re-testing.
Develop deeper expertise in one or two evaluation domains while remaining effective as a generalist across the wider framework.

Skills and Experience:

3-6 years in software engineering, test engineering, AI/ML development, or data science.
Strong programming skills with hands-on experience building test automation, evaluation tooling, or developer productivity tooling; Python or .NET background required, with .NET preferred
Practical experience evaluating or building LLM applications, RAG systems, or AI agents
Understanding of prompt engineering, retrieval-augmented generation, agent architectures, and common failure modes in probabilistic systems
Ability to design structured test approaches, rubrics, and repeatable evaluation workflows for complex AI behaviours
Analytical mindset to decompose complex agent behaviours and identify weaknesses, edge cases, and opportunities to improve the framework
Strong documentation and communication skills, with the ability to explain findings and propose practical improvements to evaluation methods and tooling

About FNZ

FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back.

We created wealth’s growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution.

We partner with the world’s leading financial institutions, with over US$2.4 trillion in assets on platform (AoP).

Together with our clients, we empower nearly 30 million people across all wealth segments to invest in their future.

Job description

AI Evaluation Engineer

Location: Gurugram, India

Seniority: Mid-level (3-6 years)

Purpose: Build and execute FNZ's AI evaluation framework and tooling, working under the guidance of the AI Evaluations Team Lead to assess quality, safety, robustness, and operational suitability of AI solutions before release.

Key Responsibilities:

Build, maintain, and evolve core components of FNZ's AI evaluation framework, including test structures, scoring approaches, reusable evaluation patterns, and supporting documentation.

Develop and improve evaluation tooling, harnesses, automation, and CI/CD integrations used to run repeatable assessments across AI agents and workflows.

Execute evaluations across FNZ's six-pillar framework, including Task Performance, Safety & Compliance, Efficiency, Groundedness & Reasoning, Robustness, and Suitability.

Create and maintain test datasets, golden sets, rubrics, and scoring criteria that reflect expected agent behaviour and business requirements.

Build and run test suites covering baseline behaviour, edge cases, failure modes, and adversarial scenarios for AI agents and workflows.

Assess multi-step agentic workflows, including planning, tool use, execution quality, recovery from errors, and adherence to controls

Verify groundedness and output quality by checking responses against source content, expected reasoning patterns, and policy constraints.

Document evaluation findings with clear evidence, communicate issues to AI solution teams, and support remediation and re-testing.

Develop deeper expertise in one or two evaluation domains while remaining effective as a generalist across the wider framework.

Skills and Experience:

3-6 years in software engineering, test engineering, AI/ML development, or data science.

Strong programming skills with hands-on experience building test automation, evaluation tooling, or developer productivity tooling; Python or .NET background required, with .NET preferred

Practical experience evaluating or building LLM applications, RAG systems, or AI agents

Understanding of prompt engineering, retrieval-augmented generation, agent architectures, and common failure modes in probabilistic systems

Ability to design structured test approaches, rubrics, and repeatable evaluation workflows for complex AI behaviours

Analytical mindset to decompose complex agent behaviours and identify weaknesses, edge cases, and opportunities to improve the framework

Strong documentation and communication skills, with the ability to explain findings and propose practical improvements to evaluation methods and tooling

Explore more

Career resources

Similar jobs

Computer and Information Research Scientist - Aerospace Medical Research (AMR21)

Director and Head of CRM Analytics and Science

AI Architecture Engineer - Supply Chain (f/m/d)

Data Scientist/Senior Data Scientist

Metrology Engineer

Senior Data Scientist