MCPNew: Mokaru MCP server is live
Apna

Lead / Staff Data Engineer - Data Platform

Company

Apna

Role

Lead / Staff Data Engineer - Data Platform

Job type

-

Found on Mokaru

1 week ago

Share this job

Salary

Not disclosed by employer

Job description

Company: Apna

Team: Data Platform / Engineering

Location: Bangalore

Experience : 5-7 Years of Experience

Why Join Apna

At Apna, data is central to how we build products, understand users, improve employer outcomes, power recommendations, and scale decision-making. This role gives you the opportunity to build the backbone of Apna’s data platform and influence how data is used across the company.

You will work on real-world, high-scale problems across jobs, users, employers, communities, matching, growth, and AI-driven systems.

About the Role

Apna is looking for a Lead / Staff Data Engineer to build and scale our core data platform. This role will work on large-scale data pipelines, lakehouse architecture, query platforms, workflow orchestration, and data reliability systems that power analytics, product intelligence, machine learning, business dashboards, experimentation, and operational decision-making across Apna.

We are looking for someone who can think deeply about data architecture , design reliable pipelines, improve data quality, and help build a platform that can scale with Apna’s growth.

What You’ll Own

You will be responsible for designing, building, and operating critical parts of Apna’s data platform, including:

  • Building scalable batch and near-real-time data pipelines across product, business, growth, and ML use cases.
  • Designing and improving our lakehouse architecture using technologies like Apache Hudi .
  • Working with query engines such as Presto / Trino for large-scale analytical workloads.
  • Building and maintaining orchestration workflows using Apache Airflow .
  • Creating reusable data models, curated datasets, and reliable data marts for analytics and product teams.
  • Improving data platform reliability, observability, SLA tracking, lineage, and data quality checks.
  • Optimizing storage, compute, query performance, and pipeline costs.
  • Partnering with product, analytics, ML, and backend engineering teams to understand data needs and convert them into scalable platform solutions.
  • Driving engineering standards around data modeling, schema evolution, partitioning, deduplication, backfills, replayability, and pipeline ownership.
  • Mentoring data engineers and influencing architecture decisions across teams.

What We’re Looking For

Must Have

  • Strong experience in data engineering , preferably at scale.
  • Hands-on experience with Apache Airflow or similar orchestration systems.
  • Strong knowledge of Presto / Trino or other distributed query engines.
  • Good understanding of Apache Hudi concepts such as:
  • Copy-on-write vs merge-on-read
  • Upserts and deletes
  • Incremental reads
  • Compaction
  • Clustering
  • Timeline and commits
  • Schema evolution
  • Partitioning strategy
  • Strong knowledge of distributed data processing and storage systems.
  • Ability to design and build reliable ETL / ELT pipelines.
  • Strong SQL skills and ability to debug complex data issues.
  • Good understanding of different data architectures, including:
  • Data warehouse
  • Data lake
  • Lakehouse
  • Lambda architecture
  • Kappa architecture
  • Medallion architecture
  • Event-driven data architecture
  • Experience with data modeling for analytics and reporting.
  • Strong programming skills in at least one language such as Python, Java, or Scala .
  • Ability to reason about trade-offs between freshness, cost, reliability, latency, and complexity.
  • Strong debugging and production ownership mindset.

Good to Have

  • Experience with Kafka, Spark, Flink, Hive, Iceberg, Delta Lake, or BigQuery.
  • Experience building internal data platforms or self-serve data infrastructure.
  • Experience with data quality frameworks such as Great Expectations, Deequ, Soda, or custom validation systems.
  • Exposure to ML feature pipelines or feature stores.
  • Experience with metadata management, data catalogs, lineage, and governance.
  • Experience with cloud infrastructure such as AWS, GCP, or Azure.
  • Understanding of privacy, compliance, PII handling, and access control in data systems.

What Success Looks Like In this role, success means

  • Critical business and product datasets are reliable, discoverable, and trusted.
  • Pipelines are observable, recoverable, and have clear SLAs.
  • Query performance improves across major analytical workloads.
  • Data freshness and quality issues reduce significantly.
  • Teams can build on top of the data platform faster without reinventing pipelines.
  • The platform can scale with Apna’s user, job, employer, and engagement data.
Resume ExampleCover Letter Example

Explore more