Biogensandbox
Data Lake Systems Engineer (Advanced Computing)
Company
Role
Data Lake Systems Engineer (Advanced Computing)
Location
Job type
Full-time
Posted
83 months ago
Salary
Job description
Reporting to the Head of Data Lake and HPC, the Data Lake Systems Engineer will play a critical role towards ensuring the usability, availability and reliability of the Data Lake computational and storage infrastructure by performing systems administrative and maintenance tasks, fulfilling user requests, resolving incidents and outages, providing comprehensive user training, maintaining high quality documentation, participating in the design and deployment of complex IT systems, and collaborating closely with the research community to more effectively leverage our Data lake infrastructure.
This person will oversee a team of contractors to ensure that the key responsibilities listed below are accomplished – this person will own delivery of these services to our customers.
Key responsibilities
- Implementation and Administration of Datalake environment.
- Monitoring and managing the Hadoop services on HDP Production and DR clusters.
- Maintenance and Monitoring of the jobs of production, UAT and Dev environments.
- Code changes and updated code deployments in the UAT and Production environments.
- Deploying code changes on Rshiny server as per the user request.
- Implmentation and Monitoring of oozie scheduled jobs for the UAT, DEV and Production environments.
- Implmentation of patching activities and applying the fixes to the Datalake environment provided by the Hortonworks.
- Working on the job failures mostly Hive and Spark jobs across the Datalake environment.
- Onboarding the new users to the Hadoop datalake environment.
- Requirements gathering for creating the databases in Hive and providing policy based access management from the Ranger for the new POCs.
- Supporting the developers for executing the Adhoc jobs in Hive environments for the existing POCs like enrollment_forecaster etc.
- HDFS home directories and Hive schema,table and column level enforcing access bases policies management from Ranger.
The ideal candidate will possess an accomplished professional track record combined with exceptional technical acumen, self-confidence, excellent written and verbal communication skills, and a demonstrated potential to continually grow into new responsibilities.
She/he must be a self-motivated team player who is coachable, flexible, resilient, and comfortable working under high pressure with multiple deadlines and minimal supervision in a dynamic research environment.
- 5 - 8 years of progressively complex related experience in data engineering.
- 3 to 4 year of working experience in Big data stack on HDP or similar environments.
- AWS Big Data Certification is strongly preferred.
- Expertise in various data
The Data Lake Systems Engineer is a critical member of the High Performance Computing (Data Lake) Team reporting into the Infrastructure & Operations IT organization, which provides support for the Biogen global research community.


