Genzeon

Genzeon

Website

AI Data Architect

Company

Genzeon

Role

AI Data Architect

Job type

Full-time

Posted

3 days ago

Share this job

Estimated salary

$82k - $210k· est. BLS 2024

Job description

AI Data Architect | Healthcare AI Platform

Genzeon Corporation — Healthcare Division

Exton, PA / Hybrid | 0–4 years | Full-time

AI native Product Architect-Exp in data engineering needed for product build out

The short version: We run a multi-model AI pipeline that processes 150K Medicare documents/year — faxed PDFs, EDI transactions, FHIR data, clinical notes. You’ll design and build the data architecture that ingests, stores, governs, and serves all of it to AI models and clinical reviewers. On-prem GPUs, hybrid cloud, HIPAA compliance. This is the real thing.

What you’ll do

Design the end-to-end data architecture for a healthcare AI platform — ingestion,storage, processing, serving, governance Build pipelines for heterogeneous healthcare data: faxed PDFs, X12 EDI (835/837/278),FHIR R4, HL7v2, CMS files, unstructured clinical notes Architect the data lake/lakehouse layer (Apache Iceberg, MinIO, DuckDB,PostgreSQL/pgvector)

Design the embedding and vector storage layer that powers RAG — chunking, indexing, retrieval optimization Build data lineage tracking from source document to AI decision

Implement HIPAA/HITRUST data governance — encryption, access controls, audit logging, PHI handling Monitor data quality across the pipeline — schema drift, completeness, freshness, anomalies

Optimize for hybrid infrastructure: on-prem GPUs (RTX 5090, L40S), NAS, Azure GovCloud, Azure Commercial

What you need

A data pipeline you’ve built that ran in production (we’ll ask about it)

SQL fluency and Python proficiency

Experience with at least one of: Spark, dbt, Airflow, Dagster, Prefect

Hands-on work with unstructured or semi-structured data — PDFs, images, OCR outputs, free text

Practical understanding of vector databases, embeddings, and how RAG systems consume data

Comfort with on-premises infrastructure, not just managed cloud services

Data quality and governance as instincts, not afterthoughts

Strong signals

Healthcare data formats (X12 EDI, FHIR, HL7, CCD/C-CDA)

Apache Iceberg, Delta Lake, or modern table formats

MinIO / S3 / object storage architecture

pgvector, Pinecone, Weaviate, or similar vector stores

DuckDB or embedded analytical engines

HIPAA technical safeguards implementation

ML data pipelines — training data, feature stores, evaluation sets, feedback loops

We don’t require

A data engineering bootcamp cert

Mastery of the entire “modern data stack”

Prior healthcare experience (but it helps)

A specific degree

To apply, submit

  • Resume
  • Link to a data project you’ve built (GitHub, architecture diagram, write-up)
  • 200 words max: “Describe the messiest data problem you’ve encountered. How did you

solve it?”

Resume ExampleCover Letter Example

Explore more