Novia Infotech
WebsiteSenior Big Data Engineer (Cloudera / Spark / Kafka)
Company
Role
Senior Big Data Engineer (Cloudera / Spark / Kafka)
Job type
Contractor
Posted
23 hours ago
Salary
Job description
Job Title: Sr. IT Data Engineer
Location: Washington, DC Onsite
Job Type: Contract on W2
Duration: 8-month contract + potential extension
Visa – Only US Citizen
Interview
- Phone screen with me
- Video interview with the client
- Final interview on location
Job Description
The Sr. IT Data Engineer supports the planning and execution of policies, practices, and projects that acquire, control, protect, and enhance the value of enterprise data assets. This role is responsible for building and maintaining robust data pipelines that ingest, cleanse, transform, and aggregate structured and unstructured data from multiple sources into scalable database and analytics environments.
The ideal candidate brings strong experience in modern data engineering, distributed data platforms, and DevOps-enabled development practices, with a focus on reliability, data quality, and performance optimization.
Key Responsibilities
- Design, develop, maintain, monitor, and support enterprise data pipelines and processing systems within Cloudera Data Platform environments.
- Build scalable ETL/ELT workflows to ingest data from a variety of internal and external sources in the correct formats while ensuring compliance with data quality standards.
- Clean, transform, enrich, and aggregate unorganized or disparate data into usable datasets and databases.
- Troubleshoot and resolve data flow, integration, and content issues across upstream and downstream systems.
- Optimize performance of data solutions that process multiple streams of input data.
- Develop and maintain code-based data engineering solutions using Python, SQL, and related tools such as PySpark, pandas, and dbt.
- Work with distributed data and computing technologies including Apache NiFi, Hadoop, MapReduce, Hive, HBase, Kafka, and Spark.
- Support source control, release management, and automated deployment processes using Git and DevOps/CI/CD practices.
- Implement and maintain continuous integration and continuous delivery pipelines for data platform solutions.
- Work in Agile delivery environments using Scrum and Kanban methodologies.
- Perform development and support activities in UNIX/Linux environments, including shell scripting and command-line operations.
- Collaborate with cross-functional teams, including developers, analysts, architects, and business stakeholders, to deliver reliable and scalable data solutions.
Required Qualifications
- 5+ years of experience in application and/or data development, including strong hands-on experience with Python.
- 5+ years of experience with data integration and ingestion tools, including Apache NiFi.
- Demonstrated proficiency developing, maintaining, monitoring, and supporting long-term operation of data pipelines or processing systems in Cloudera Data Platform.
- Advanced knowledge of SQL and Java.
- Strong experience with Microsoft SQL Server.
- Advanced knowledge of distributed data and computing platforms, including:
- Apache NiFi
- Hadoop
- MapReduce
- Hive
- HBase
- Kafka
- Spark
- Experience with PySpark, pandas, and/or dbt.
- Experience with Git and DevOps-enabled development practices.
- Experience implementing and maintaining CI/CD pipelines and supporting data platform management.
- Experience with Scrum and Kanban methodologies.
- Experience with UNIX/Linux, including basic commands and shell scripting.
- Strong analytical, troubleshooting, and problem-solving skills.
- Strong written and verbal communication skills.
- U.S. Citizenship required.
Preferred Qualifications
- Experience supporting enterprise data environments in regulated or large-scale organizations.
- Experience with data quality controls, monitoring, and operational support for production pipelines.
- Experience working with streaming and batch data architectures.
- Familiarity with performance tuning and optimization in large distributed environments.


