Jobs via Dice

Jobs via Dice

AI/ML Engineer required at Atlanta/Frisco (hybrid onsite for both the locations)

Role

AI/ML Engineer required at Atlanta/Frisco (hybrid onsite for both the locations)

Job type

Full-time

Posted

21 hours ago

Salary

Not disclosed by employer

Job description

Dice is the leading career destination for tech experts at every stage of their careers. Our client, Spar Information Systems, is seeking the following. Apply via Dice today!

AI/ML Engineer - Customer Data Platform

Location: Atlanta/Frisco

CDP MISSION: Our mission is to be the authoritative source of truth for customer data - delivering timely, high-quality data at scale to power the contextual experiences that drive the growth of this company. Every customer profile must be accurate, trusted, and available when it matters, across every touchpoint, for the entire US adult population.

Job Overview

We are seeking an AI/ML Engineer to build the intelligent systems that power identity resolution and data accessibility within our Customer Data Platform (CDP) - the authoritative source of truth for customer data across the entire US adult population.

This role focuses on developing machine learning pipelines that deduplicate, link, and resolve customer identities across disparate data sources - the core capability that transforms raw data into trusted, unified customer profiles. You will also contribute to LLM-based solutions that enable natural language querying of CDP data, making the platform accessible to business users across the organization.

You will work on both classical ML techniques and modern LLM-based approaches to ensure that every customer identity in CDP is accurately resolved, every profile is trustworthy, and every user can access the data they need.

Job Responsibilities - Identity Resolution

  • Develop and deploy entity resolution models to match and deduplicate customer records across multiple systems - directly impacting the accuracy of CDP as the source of truth
  • Implement probabilistic matching techniques (e.g., Fellegi-Sunter) and ML models (gradient boosting, neural classifiers) for record linkage across the US adult population
  • Build candidate blocking pipelines using phonetic algorithms (Soundex, Double Metaphone), token similarity, and LSH to handle billions of potential match pairs efficiently
  • Apply fuzzy matching techniques (Levenshtein, Jaro-Winkler, Jaccard) for customer attributes such as name, address, phone, and identifiers
  • Develop clustering algorithms (DBSCAN, hierarchical clustering) to create unified "golden customer profiles" that serve as the authoritative representation of each individual
  • Build embedding-based similarity systems using Sentence-BERT or transformer-based models for semantic matching
  • Implement ANN/KNN retrieval systems (FAISS, Annoy) for large-scale entity matching across population-scale datasets

Job Responsibilities - AI/LLM

  • Use LLMs (e.g., GPT, Claude) for classification and disambiguation of entity matches, improving resolution accuracy where traditional methods fall short
  • Build and support RAG pipelines to enrich customer profiles with contextual data from unstructured sources
  • Perform prompt engineering and evaluation for structured data extraction from unstructured inputs feeding into CDP
  • Contribute to NLQ-to-SQL systems, enabling business users to query CDP data using natural language - making the authoritative source of truth accessible to non-technical stakeholders
  • Support integration with vector databases (e.g., Pinecone, pgvector, Qdrant) for semantic search across customer data

Education And Work Experience

  • Bachelor's or Master's degree in Computer Science, Data Science, or related field
  • 3+ years of experience in ML/AI engineering
  • At least 1 year of experience in entity resolution, record linkage, or deduplication - ideally at scale

Technical Skills

  • Programming: Python (required)
  • Libraries: scikit-learn, HuggingFace Transformers, RapidFuzz, jellyfish
  • Experience with LLM APIs (OpenAI, Anthropic) and prompt pipelines
  • Strong SQL skills and experience with Spark or Dask for distributed processing
  • Familiarity with vector databases and embedding-based retrieval
  • Experience with ML lifecycle tools (MLflow or similar)
  • Understanding of data quality metrics and how identity resolution impacts downstream trust

Knowledge, Skills, And Abilities

  • Strong understanding of ML fundamentals and similarity matching techniques applied to customer identity
  • Ability to work with large, messy, real-world datasets spanning hundreds of millions of records
  • Understanding of precision/recall tradeoffs in identity resolution and their impact on data trust
  • Good problem-solving and analytical skills
  • Ability to collaborate with data engineering, platform, and business teams to deliver accurate customer profiles

Licenses and Certifications

  • At least 18 years of age
  • Legally authorized to work in the United States

Travel

Travel Required: No

Resume ExampleCover Letter Example

Explore more