Jobs via Dice
AI/ML Engineer required at Atlanta/Frisco (hybrid onsite for both the locations)
Company
Role
AI/ML Engineer required at Atlanta/Frisco (hybrid onsite for both the locations)
Location
Job type
Full-time
Posted
21 hours ago
Salary
Job description
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Spar Information Systems, is seeking the following. Apply via Dice today!
AI/ML Engineer - Customer Data Platform
Location: Atlanta/Frisco
CDP MISSION: Our mission is to be the authoritative source of truth for customer data - delivering timely, high-quality data at scale to power the contextual experiences that drive the growth of this company. Every customer profile must be accurate, trusted, and available when it matters, across every touchpoint, for the entire US adult population.
Job Overview
We are seeking an AI/ML Engineer to build the intelligent systems that power identity resolution and data accessibility within our Customer Data Platform (CDP) - the authoritative source of truth for customer data across the entire US adult population.
This role focuses on developing machine learning pipelines that deduplicate, link, and resolve customer identities across disparate data sources - the core capability that transforms raw data into trusted, unified customer profiles. You will also contribute to LLM-based solutions that enable natural language querying of CDP data, making the platform accessible to business users across the organization.
You will work on both classical ML techniques and modern LLM-based approaches to ensure that every customer identity in CDP is accurately resolved, every profile is trustworthy, and every user can access the data they need.
Job Responsibilities - Identity Resolution
- Develop and deploy entity resolution models to match and deduplicate customer records across multiple systems - directly impacting the accuracy of CDP as the source of truth
- Implement probabilistic matching techniques (e.g., Fellegi-Sunter) and ML models (gradient boosting, neural classifiers) for record linkage across the US adult population
- Build candidate blocking pipelines using phonetic algorithms (Soundex, Double Metaphone), token similarity, and LSH to handle billions of potential match pairs efficiently
- Apply fuzzy matching techniques (Levenshtein, Jaro-Winkler, Jaccard) for customer attributes such as name, address, phone, and identifiers
- Develop clustering algorithms (DBSCAN, hierarchical clustering) to create unified "golden customer profiles" that serve as the authoritative representation of each individual
- Build embedding-based similarity systems using Sentence-BERT or transformer-based models for semantic matching
- Implement ANN/KNN retrieval systems (FAISS, Annoy) for large-scale entity matching across population-scale datasets
Job Responsibilities - AI/LLM
- Use LLMs (e.g., GPT, Claude) for classification and disambiguation of entity matches, improving resolution accuracy where traditional methods fall short
- Build and support RAG pipelines to enrich customer profiles with contextual data from unstructured sources
- Perform prompt engineering and evaluation for structured data extraction from unstructured inputs feeding into CDP
- Contribute to NLQ-to-SQL systems, enabling business users to query CDP data using natural language - making the authoritative source of truth accessible to non-technical stakeholders
- Support integration with vector databases (e.g., Pinecone, pgvector, Qdrant) for semantic search across customer data
Education And Work Experience
- Bachelor's or Master's degree in Computer Science, Data Science, or related field
- 3+ years of experience in ML/AI engineering
- At least 1 year of experience in entity resolution, record linkage, or deduplication - ideally at scale
Technical Skills
- Programming: Python (required)
- Libraries: scikit-learn, HuggingFace Transformers, RapidFuzz, jellyfish
- Experience with LLM APIs (OpenAI, Anthropic) and prompt pipelines
- Strong SQL skills and experience with Spark or Dask for distributed processing
- Familiarity with vector databases and embedding-based retrieval
- Experience with ML lifecycle tools (MLflow or similar)
- Understanding of data quality metrics and how identity resolution impacts downstream trust
Knowledge, Skills, And Abilities
- Strong understanding of ML fundamentals and similarity matching techniques applied to customer identity
- Ability to work with large, messy, real-world datasets spanning hundreds of millions of records
- Understanding of precision/recall tradeoffs in identity resolution and their impact on data trust
- Good problem-solving and analytical skills
- Ability to collaborate with data engineering, platform, and business teams to deliver accurate customer profiles
Licenses and Certifications
- At least 18 years of age
- Legally authorized to work in the United States
Travel
Travel Required: No