Dice is the leading career destination for tech experts at every stage of their careers. Our client, Spar Information Systems, is seeking the following. Apply via Dice today!

AI/ML Engineer - Customer Data Platform

Location: Atlanta/Frisco

CDP MISSION: Our mission is to be the authoritative source of truth for customer data - delivering timely, high-quality data at scale to power the contextual experiences that drive the growth of this company. Every customer profile must be accurate, trusted, and available when it matters, across every touchpoint, for the entire US adult population.

Job Overview

We are seeking an AI/ML Engineer to build the intelligent systems that power identity resolution and data accessibility within our Customer Data Platform (CDP) - the authoritative source of truth for customer data across the entire US adult population.

This role focuses on developing machine learning pipelines that deduplicate, link, and resolve customer identities across disparate data sources - the core capability that transforms raw data into trusted, unified customer profiles. You will also contribute to LLM-based solutions that enable natural language querying of CDP data, making the platform accessible to business users across the organization.

You will work on both classical ML techniques and modern LLM-based approaches to ensure that every customer identity in CDP is accurately resolved, every profile is trustworthy, and every user can access the data they need.

Job Responsibilities - Identity Resolution

Develop and deploy entity resolution models to match and deduplicate customer records across multiple systems - directly impacting the accuracy of CDP as the source of truth
Implement probabilistic matching techniques (e.g., Fellegi-Sunter) and ML models (gradient boosting, neural classifiers) for record linkage across the US adult population
Build candidate blocking pipelines using phonetic algorithms (Soundex, Double Metaphone), token similarity, and LSH to handle billions of potential match pairs efficiently
Apply fuzzy matching techniques (Levenshtein, Jaro-Winkler, Jaccard) for customer attributes such as name, address, phone, and identifiers
Develop clustering algorithms (DBSCAN, hierarchical clustering) to create unified "golden customer profiles" that serve as the authoritative representation of each individual
Build embedding-based similarity systems using Sentence-BERT or transformer-based models for semantic matching
Implement ANN/KNN retrieval systems (FAISS, Annoy) for large-scale entity matching across population-scale datasets

Job Responsibilities - AI/LLM

Use LLMs (e.g., GPT, Claude) for classification and disambiguation of entity matches, improving resolution accuracy where traditional methods fall short
Build and support RAG pipelines to enrich customer profiles with contextual data from unstructured sources
Perform prompt engineering and evaluation for structured data extraction from unstructured inputs feeding into CDP
Contribute to NLQ-to-SQL systems, enabling business users to query CDP data using natural language - making the authoritative source of truth accessible to non-technical stakeholders
Support integration with vector databases (e.g., Pinecone, pgvector, Qdrant) for semantic search across customer data

Education And Work Experience

Bachelor's or Master's degree in Computer Science, Data Science, or related field
3+ years of experience in ML/AI engineering
At least 1 year of experience in entity resolution, record linkage, or deduplication - ideally at scale

Technical Skills

Programming: Python (required)
Libraries: scikit-learn, HuggingFace Transformers, RapidFuzz, jellyfish
Experience with LLM APIs (OpenAI, Anthropic) and prompt pipelines
Strong SQL skills and experience with Spark or Dask for distributed processing
Familiarity with vector databases and embedding-based retrieval
Experience with ML lifecycle tools (MLflow or similar)
Understanding of data quality metrics and how identity resolution impacts downstream trust

Knowledge, Skills, And Abilities

Strong understanding of ML fundamentals and similarity matching techniques applied to customer identity
Ability to work with large, messy, real-world datasets spanning hundreds of millions of records
Understanding of precision/recall tradeoffs in identity resolution and their impact on data trust
Good problem-solving and analytical skills
Ability to collaborate with data engineering, platform, and business teams to deliver accurate customer profiles

Licenses and Certifications

At least 18 years of age
Legally authorized to work in the United States

Travel

Travel Required: No

AI/ML Engineer required at Atlanta/Frisco (hybrid onsite for both the locations)

Job description

Explore more

Similar jobs

Experience Manager

Interim Retail Floor Lead, Avalon

Key Lead, Atlanta

Supply Chain Manager Last Mile, NA AMXL LRP

L3 Senior Software Engineer, RPA

Lead Product Manager, Integrations