robusta
WebsiteSenior Data Quality Engineer (4 Months Contract ) Onsite in UAE - Octopus by RTG
Company
Role
Senior Data Quality Engineer (4 Months Contract ) Onsite in UAE - Octopus by RTG
Job type
-
Found on Mokaru
Yesterday
Salary
Job description
About the Role
We are seeking an experienced Senior Databricks Data Quality Engineer to lead the design, implementation, and automation of enterprise-scale data quality frameworks within a Databricks environment. The successful candidate will play a key role in establishing data quality controls, profiling frameworks, remediation processes, and AI-assisted quality monitoring across a large-scale data platform consisting of 170+ datasets and over 1,300 Critical Data Elements (CDEs).
This role requires strong expertise in Databricks, PySpark, Delta Lake, MLflow, and modern data quality management practices.
Key Responsibilities
Data Platform & Databricks Configuration
- Configure and manage Databricks workspaces, compute clusters, PySpark notebooks, Delta Lake architecture, and Unity Catalog integrations.
- Design scalable data quality processing frameworks across 170+ datasets and 1,346 prioritized Critical Data Elements (CDEs).
Data Profiling & Quality Assessment
- Develop AI-assisted profiling notebooks using PySpark to establish baseline data quality scores.
- Assess data quality across six key dimensions including:
- Completeness
- Uniqueness
- Validity
- Consistency
- Accuracy
- Timeliness
•
- Analyze null rates, duplicate records, invalid values, format violations, outliers, and schema drift.
Data Quality Rule Framework
- Design and build a scalable Data Quality Rule Factory using parameterized PySpark functions.
- Enable automated deployment of over 6,700 data quality rules without manual rule-by-rule development.
- Create reusable rule templates across datasets and data quality dimensions.
Pipeline Quality Enforcement
- Integrate data quality controls within Bronze, Silver, and Gold Delta Lake layers.
- Implement quality gates that prevent data progression unless predefined thresholds are met.
- Develop reusable Databricks Jobs for automated validation and monitoring.
Data Cleansing & AI-Driven Remediation
- Build automated data cleansing pipelines for:
- Standardization
- Deduplication
- Schema harmonization
•
- Deploy MLflow-managed machine learning models for:
- Anomaly detection
- Fuzzy duplicate detection
- Exact duplicate identification
•
- Ensure explainability of model outputs and support human-in-the-loop validation processes.
Exception Management
- Design failed-record handling frameworks and quarantine Delta tables.
- Capture failure reasons, affected CDEs, rule references, and timestamps.
- Develop automated reprocessing mechanisms for corrected records.
Data Quality Monitoring & Reporting
- Build Delta Lake aggregation tables for data quality metrics.
- Deliver data quality KPIs to Power BI dashboards including:
- Dimension-level scores
- Rule pass/fail rates
- SLA adherence metrics
•
- Configure automated alerting using Databricks SQL Alerts and Azure Monitor.
Predictive Data Quality Analytics
- Develop predictive models to identify datasets at risk of quality degradation.
- Support AI-assisted Root Cause Analysis (RCA) using profiling outputs and machine learning techniques.
- Export and prepare remediation datasets for prioritization and governance reporting.
- Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related field.
- 5+ years of experience in Data Engineering or Data Quality Engineering.
- 3+ years of hands-on experience with Databricks and PySpark.
- Strong expertise in Delta Lake architecture and data pipeline development.
- Experience with Unity Catalog implementation and governance.
- Hands-on experience with MLflow and machine learning deployment.
- Strong SQL skills and data modeling expertise.
- Experience building enterprise-scale data quality frameworks.
- Experience integrating Databricks with Power BI and Azure services.
- Strong understanding of data governance, metadata management, and data quality dimensions.
Preferred Qualifications
- Microsoft Azure certifications.
- Databricks Certified Data Engineer Associate or Professional.
- Experience with enterprise data governance programs.
- Experience implementing AI-assisted data quality and remediation solutions.
- Knowledge of Master Data Management (MDM) principles.


