About Manifest Global

Manifest Global is building the infrastructure for global human capital mobility -connecting students, schools, universities, and employers across 50+ countries. Our portfolio spans Cialfo (AI-powered college counseling, 2,000+ schools), BridgeU (university guidance for international schools globally), Kaaiser (trusted study abroad counseling across India and Southeast Asia), and Explore (AI-powered university outreach, 1,000+ university partners). Together, we move talent across borders at scale. $80M raised. Still early.

About This Role

Manifest Global operates four brands across 50+ countries, generating data across thousands of schools, hundreds of thousands of students, and 1,000+ university partners. Counselor behaviour, student application journeys, university conversion rates, placement outcomes, attribution revenue - it's all there. The data exists. The question is whether the infrastructure around it is good enough to make it useful.

Right now, the data platform works. Pipelines run, the warehouse holds data, the BI layer surfaces reports. But Manifest is growing - new brands, new markets, new activation use cases - and the infrastructure needs to scale with it. There are pipelines that need to be more reliable. Transformation logic that needs to be cleaner. Warehouse design that needs to handle more volume without degrading performance. And an activation layer - reverse ETL, operational analytics, data flowing into the tools the business actually uses - that is still being built.

As a Senior Data Engineer, you will own significant parts of the data platform end to end - ingestion, transformation, warehouse, activation - and you will be one of the people who determines whether Manifest's data infrastructure is a genuine competitive advantage or a persistent constraint. You will work closely with Principal Engineers, Product, and business stakeholders across all four brands, and you will be expected to operate with the ownership and judgment of someone who has built production-grade data systems before.

What makes this role different: Manifest has real data - cross-brand, multi-geography, commercially significant data. The stack is modern: Snowflake, dbt, Hevo, Airtable, Metabase. The problems are real. And when the data infrastructure surfaces the right insight, it changes a decision that affects real students and real institutions.

AI is central to how we build: This isn't just a data engineering role - it is a role where you will actively design and build AI infrastructure that accelerates the team's own development velocity. We use Snowflake Cortex AI with Claude in our daily engineering workflow - for debugging, RCA, query optimisation, and pipeline analysis. We have already cut root cause analysis time. The next step is embedding AI deeper: automated ticket handling, intelligent monitoring, and AI-assisted development tooling that lets the team move faster without sacrificing reliability.

What You Will Own

1. AI Infrastructure for Data Engineering

Design and build AI-assisted development tooling - LLM-powered code generation for dbt models, SQL transformations, and pipeline scaffolding that dramatically reduces time-to-production for new data assets
Build intelligent data quality and anomaly detection systems - AI-driven monitoring that learns normal patterns across pipelines and surfaces anomalies before they propagate downstream, replacing manual threshold-based alerting
Implement AI-augmented data cataloguing and lineage - automated documentation generation, schema understanding, and semantic tagging so engineers spend less time writing docs and more time building
Develop AI-powered pipeline debugging and root cause analysis - tooling that diagnoses failures, traces impact through the DAG, and proposes fixes rather than requiring engineers to trace failures manually
Build and maintain the infrastructure that supports AI features - vector stores, embedding pipelines, retrieval layers, and model serving infrastructure that powers AI capabilities across Cialfo, BridgeU, and Explore
Evaluate and adopt emerging AI developer tools - stay ahead of how AI tooling (Claude, Cortex AI, GitHub Copilot, LLM APIs) can be embedded into the team's workflow to shorten feedback loops and accelerate feature delivery

2. Data Warehouse Design, Cost & Maintenance

Own significant portions of the Snowflake data warehouse - schema design, performance optimisation, and the integrity of the data models that the rest of the stack depends on
Apply strong data warehousing methodologies: dimensional modelling, layered transformation logic, clear separation between raw, staged, and served layers
Design and build cross-brand data primitives - shared, canonical data layers for K12, Student, University, and Application data that work consistently across Cialfo, BridgeU, and Kaaiser. This is active work and a critical foundation for the multi-brand data platform
Own Snowflake cost optimisation - monitor warehouse spend, identify high-cost queries and sync jobs, right-size warehouse configurations, and drive measurable reductions in monthly compute spend.
Ensure the warehouse handles increasing data volumes from across all four brands without degrading query performance or downstream reliability

3. ETL/ELT Pipelines, Scheduling & Transformation Logic

Design, build, and maintain production-grade data pipelines - ingestion via Hevo or similar, transformation via dbt, SQL-based logic that is clean, documented, and maintainable
Design and manage Snowflake Task DAGs - build and maintain dependency-chained task graphs for ingestion, LLM processing, and sync workflows. Understand how to structure root tasks, child tasks, CRON scheduling, warehouse assignment, and failure isolation so pipelines don't cascade-fail
Own Airtable as an operational data layer - manage Snowflake-to-Airtable and Airtable-to-Snowflake sync workflows, including the SNOWFLAKE_TO_AIRTABLE_TABLELIST sync config, upsert logic, incremental filters, and sync cost optimisation. Airtable is both a key data source and a reporting destination across brands
Build and own reverse ETL workflows that activate warehouse data into operational tools - getting the right data into the hands of the teams that need it, not just into dashboards
Take full ownership of pipeline failures: root cause identification, fix, downstream impact analysis, and prevention - not just resolution

4. Data Quality & Reliability

Define and enforce data quality standards across datasets you own - automated validations, delta checks, row counts, time-based monitoring
Build monitoring and alerting that surfaces problems before they reach the business
Document data lineage, transformation assumptions, and technical decisions so the platform is understandable and maintainable as the team grows

5. BI Platform & Reporting

Maintain and enhance the existing BI layer - Metabase and build reporting interfaces that non-technical stakeholders can actually use
Collaborate with Product and Analytics teams to translate business needs into reliable technical solutions
Communicate clearly across all four brands - be the person who can explain a data problem in business terms and a business problem in data terms

What Success Looks Like

You'll start by building a complete picture of the current state - which pipelines are fragile, where data quality is inconsistent, what the highest-impact improvements look like, and where activation use cases aren't yet built. You will have a point of view on where to move first.

From there, the infrastructure will be measurably more reliable. Pipelines that were breaking will run consistently. Data quality issues will be caught early or prevented entirely. The BI layer will be getting used - not just maintained.

Over time, the data platform will be something the business genuinely relies on - fast enough to support the pace of growth, reliable enough that data consumers trust what they are looking at, and built in a way that the next engineer who joins can understand and extend without starting from scratch.

About You

Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field - or equivalent practical experience

Experience

5+ years building and maintaining production-grade data pipelines and data warehouses - not prototypes, but systems that real business decisions depend on
Strong experience architecting and implementing large-scale business intelligence solutions
Strong experience in data warehouse design and advanced SQL
Hands-on experience with the modern data stack: Snowflake (or equivalent cloud warehouse), dbt, Hevo or Airbyte or similar
Advanced SQL proficiency: CTEs, window functions, QUALIFY, query optimisation, and understanding the difference between SQL that works and SQL that scales
Experience with Snowflake-native features: Task DAGs, Dynamic Tables, Snowflake Cortex AI, AI_COMPLETE, warehouse sizing, and query profiling
Experience with Airtable as an operational data layer - syncing data between Snowflake and Airtable, managing upsert logic, and keeping sync costs low
Experience owning Snowflake cost monitoring and optimisation - identifying expensive queries, bloated sync jobs, and warehouse over-provisioning
Experience with user-facing BI tools: Metabase, Looker, or similar
Experience working in agile, ticket-based workflows (Jira)
Experience building or integrating AI/LLM tooling into data engineering workflows - specifically Snowflake Cortex AI, or LLM APIs (OpenAI, Anthropic, etc.) for structured data tasks, fuzzy matching, or pipeline automation
Familiarity with vector databases, embedding pipelines, or retrieval-augmented generation (RAG) infrastructure is a strong plus
Exposure to MLOps or AI infrastructure patterns - model serving, feature stores, or AI monitoring - is an advantage

Skills & Qualities

You take ownership of reliability. When a pipeline fails, you don't just fix the immediate problem - you understand why it happened, what it affected downstream, and what needs to change so it doesn't happen again
You communicate clearly with non-technical stakeholders. You've sat in a requirements conversation with a commercial or product team, understood what they were actually asking for, and built something that answered the real question
You document your work well enough that someone else could maintain it - data lineage, transformation assumptions, technical decisions
Comfortable working in a multi-brand, multi-stakeholder environment where data problems span different systems, teams, and geographies
Strong problem-solving skills and attention to detail
Ability to hold multiple priorities simultaneously and make good judgments about what to work on first
You're genuinely excited about AI as a force multiplier for engineering teams - you actively use AI tools in your own workflow and have a point of view on how they can make data teams faster without sacrificing reliability

Why Manifest

We're building the infrastructure for global human capital mobility - the rails that move students, schools, universities, and employers across 50+ countries. Cialfo is in 2,000+ schools. Explore is trusted by 1,000+ universities. BridgeU runs across the UK, Europe, and the Middle East. Kaaiser has guided students across India and Southeast Asia since 1997.

The opportunity is real. $700B flows annually in remittances from migrant workers. 85M workers will be missing from developed economies by 2030. We're building the operating system that changes that.

$80M raised from Tiger Global, SIG, and Square Peg. Still early.

The team has already built the infrastructure for AI-native engineering - shared conventions, a live skills library, AI-assisted workflows across engineering, QE, product, and design. Saige is in production. Explore's AI capabilities are in production. This isn't an aspiration we're hiring you to bring to life. It's an operating system we're hiring you to extend, scale, and make permanent.

Senior Data Engineer

Job description