Cialfo
Senior Data Engineer
Salary
Job description
About Manifest Global
Manifest Global is building the infrastructure for global human capital mobility -connecting students, schools, universities, and employers across 50+ countries. Our portfolio spans Cialfo (AI-powered college counseling, 2,000+ schools), BridgeU (university guidance for international schools globally), Kaaiser (trusted study abroad counseling across India and Southeast Asia), and Explore (AI-powered university outreach, 1,000+ university partners). Together, we move talent across borders at scale. $80M raised. Still early.
About This Role
Manifest Global operates four brands across 50+ countries, generating data across thousands of schools, hundreds of thousands of students, and 1,000+ university partners. Counselor behaviour, student application journeys, university conversion rates, placement outcomes, attribution revenue - it's all there. The data exists. The question is whether the infrastructure around it is good enough to make it useful.
Right now, the data platform works. Pipelines run, the warehouse holds data, the BI layer surfaces reports. But Manifest is growing - new brands, new markets, new activation use cases - and the infrastructure needs to scale with it. There are pipelines that need to be more reliable. Transformation logic that needs to be cleaner. Warehouse design that needs to handle more volume without degrading performance. And an activation layer - reverse ETL, operational analytics, data flowing into the tools the business actually uses - that is still being built.
As a Senior Data Engineer, you will own significant parts of the data platform end to end - ingestion, transformation, warehouse, activation - and you will be one of the people who determines whether Manifest's data infrastructure is a genuine competitive advantage or a persistent constraint. You will work closely with Principal Engineers, Product, and business stakeholders across all four brands, and you will be expected to operate with the ownership and judgment of someone who has built production-grade data systems before.
What makes this role different: Manifest has real data - cross-brand, multi-geography, commercially significant data. The stack is modern: Snowflake, dbt, Hevo, Airtable, Metabase. The problems are real. And when the data infrastructure surfaces the right insight, it changes a decision that affects real students and real institutions.
AI is central to how we build: This isn't just a data engineering role - it is a role where you will actively design and build AI infrastructure that accelerates the team's own development velocity. We use Snowflake Cortex AI with Claude in our daily engineering workflow - for debugging, RCA, query optimisation, and pipeline analysis. We have already cut root cause analysis time. The next step is embedding AI deeper: automated ticket handling, intelligent monitoring, and AI-assisted development tooling that lets the team move faster without sacrificing reliability.
What You Will Own
1. AI Infrastructure for Data Engineering
- Design and build AI-assisted development tooling - LLM-powered code generation for dbt models, SQL transformations, and pipeline scaffolding that dramatically reduces time-to-production for new data assets
- Build intelligent data quality and anomaly detection systems - AI-driven monitoring that learns normal patterns across pipelines and surfaces anomalies before they propagate downstream, replacing manual threshold-based alerting
- Implement AI-augmented data cataloguing and lineage - automated documentation generation, schema understanding, and semantic tagging so engineers spend less time writing docs and more time building
- Develop AI-powered pipeline debugging and root cause analysis - tooling that diagnoses failures, traces impact through the DAG, and proposes fixes rather than requiring engineers to trace failures manually
- Build and maintain the infrastructure that supports AI features - vector stores, embedding pipelines, retrieval layers, and model serving infrastructure that powers AI capabilities across Cialfo, BridgeU, and Explore
- Evaluate and adopt emerging AI developer tools - stay ahead of how AI tooling (Claude, Cortex AI, GitHub Copilot, LLM APIs) can be embedded into the team's workflow to shorten feedback loops and accelerate feature delivery
2. Data Warehouse Design, Cost & Maintenance
- Own significant portions of the Snowflake data warehouse - schema design, performance optimisation, and the integrity of the data models that the rest of the stack depends on
- Apply strong data warehousing methodologies: dimensional modelling, layered transformation logic, clear separation between raw, staged, and served layers
- Design and build cross-brand data primitives - shared, canonical data layers for K12, Student, University, and Application data that work consistently across Cialfo, BridgeU, and Kaaiser. This is active work and a critical foundation for the multi-brand data platform
- Own Snowflake cost optimisation - monitor warehouse spend, identify high-cost queries and sync jobs, right-size warehouse configurations, and drive measurable reductions in monthly compute spend.
- Ensure the warehouse handles increasing data volumes from across all four brands without degrading query performance or downstream reliability
3. ETL/ELT Pipelines, Scheduling & Transformation Logic
- Design, build, and maintain production-grade data pipelines - ingestion via Hevo or similar, transformation via dbt, SQL-based logic that is clean, documented, and maintainable
- Design and manage Snowflake Task DAGs - build and maintain dependency-chained task graphs for ingestion, LLM processing, and sync workflows. Understand how to structure root tasks, child tasks, CRON scheduling, warehouse assignment, and failure isolation so pipelines don't cascade-fail
- Own Airtable as an operational data layer - manage Snowflake-to-Airtable and Airtable-to-Snowflake sync workflows, including the SNOWFLAKE_TO_AIRTABLE_TABLELIST sync config, upsert logic, incremental filters, and sync cost optimisation. Airtable is both a key data source and a reporting destination across brands
- Build and own reverse ETL workflows that activate warehouse data into operational tools - getting the right data into the hands of the teams that need it, not just into dashboards
- Take full ownership of pipeline failures: root cause identification, fix, downstream impact analysis, and prevention - not just resolution
4. Data Quality & Reliability
- Define and enforce data quality standards across datasets you own - automated validations, delta checks, row counts, time-based monitoring
- Build monitoring and alerting that surfaces problems before they reach the business
- Document data lineage, transformation assumptions, and technical decisions so the platform is understandable and maintainable as the team grows
5. BI Platform & Reporting
- Maintain and enhance the existing BI layer - Metabase and build reporting interfaces that non-technical stakeholders can actually use
- Collaborate with Product and Analytics teams to translate business needs into reliable technical solutions
- Communicate clearly across all four brands - be the person who can explain a data problem in business terms and a business problem in data terms
What Success Looks Like
You'll start by building a complete picture of the current state - which pipelines are fragile, where data quality is inconsistent, what the highest-impact improvements look like, and where activation use cases aren't yet built. You will have a point of view on where to move first.
From there, the infrastructure will be measurably more reliable. Pipelines that were breaking will run consistently. Data quality issues will be caught early or prevented entirely. The BI layer will be getting used - not just maintained.
Over time, the data platform will be something the business genuinely relies on - fast enough to support the pace of growth, reliable enough that data consumers trust what they are looking at, and built in a way that the next engineer who joins can understand and extend without starting from scratch.
About You
Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related field - or equivalent practical experience
Experience
- 5+ years building and maintaining production-grade data pipelines and data warehouses - not prototypes, but systems that real business decisions depend on
- Strong experience architecting and implementing large-scale business intelligence solutions
- Strong experience in data warehouse design and advanced SQL
- Hands-on experience with the modern data stack: Snowflake (or equivalent cloud warehouse), dbt, Hevo or Airbyte or similar
- Advanced SQL proficiency: CTEs, window functions, QUALIFY, query optimisation, and understanding the difference between SQL that works and SQL that scales
- Experience with Snowflake-native features: Task DAGs, Dynamic Tables, Snowflake Cortex AI, AI_COMPLETE, warehouse sizing, and query profiling
- Experience with Airtable as an operational data layer - syncing data between Snowflake and Airtable, managing upsert logic, and keeping sync costs low
- Experience owning Snowflake cost monitoring and optimisation - identifying expensive queries, bloated sync jobs, and warehouse over-provisioning
- Experience with user-facing BI tools: Metabase, Looker, or similar
- Experience working in agile, ticket-based workflows (Jira)
- Experience building or integrating AI/LLM tooling into data engineering workflows - specifically Snowflake Cortex AI, or LLM APIs (OpenAI, Anthropic, etc.) for structured data tasks, fuzzy matching, or pipeline automation
- Familiarity with vector databases, embedding pipelines, or retrieval-augmented generation (RAG) infrastructure is a strong plus
- Exposure to MLOps or AI infrastructure patterns - model serving, feature stores, or AI monitoring - is an advantage
Skills & Qualities
- You take ownership of reliability. When a pipeline fails, you don't just fix the immediate problem - you understand why it happened, what it affected downstream, and what needs to change so it doesn't happen again
- You communicate clearly with non-technical stakeholders. You've sat in a requirements conversation with a commercial or product team, understood what they were actually asking for, and built something that answered the real question
- You document your work well enough that someone else could maintain it - data lineage, transformation assumptions, technical decisions
- Comfortable working in a multi-brand, multi-stakeholder environment where data problems span different systems, teams, and geographies
- Strong problem-solving skills and attention to detail
- Ability to hold multiple priorities simultaneously and make good judgments about what to work on first
- You're genuinely excited about AI as a force multiplier for engineering teams - you actively use AI tools in your own workflow and have a point of view on how they can make data teams faster without sacrificing reliability
Why Manifest
We're building the infrastructure for global human capital mobility - the rails that move students, schools, universities, and employers across 50+ countries. Cialfo is in 2,000+ schools. Explore is trusted by 1,000+ universities. BridgeU runs across the UK, Europe, and the Middle East. Kaaiser has guided students across India and Southeast Asia since 1997.
The opportunity is real. $700B flows annually in remittances from migrant workers. 85M workers will be missing from developed economies by 2030. We're building the operating system that changes that.
$80M raised from Tiger Global, SIG, and Square Peg. Still early.
The team has already built the infrastructure for AI-native engineering - shared conventions, a live skills library, AI-assisted workflows across engineering, QE, product, and design. Saige is in production. Explore's AI capabilities are in production. This isn't an aspiration we're hiring you to bring to life. It's an operating system we're hiring you to extend, scale, and make permanent.


