Aricent

PySpark/Databricks Engineer - Big Data Technologies

Click Here to Apply

Job Location

in, India

Job Description

Job : PySpark/Databricks Engineer Open for Multiple Locations with WFO and WFH Job Description : We are looking for a PySpark solutions developer and data engineer that is able to design and build solutions for one of our Fortune 500 Client programs, which aims to build a data standardized and curation-based Hadoop cluster This high visibility, fast-paced key initiative will integrate data across internal and external sources, provide analytical insights, and integrate with the customer s critical systems Key Responsibilities : - Ability to design, build and unit test applications on Spark framework on Python. - Build PySpark based applications for both batch and streaming requirements, which will require in-depth knowledge on majority of Hadoop and NoSQL databases as well. - Develop and execute data pipeline testing processes and validate business rules and policies. - Optimize performance of the built Spark applications in Hadoop using configurations around Spark Context, Spark-SQL, Data Frame, and Pair RDDs. - Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec respectively. - Ability to design build real-time applications using Apache Kafka Spark Streaming - Build integrated solutions leveraging Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec. - Build data tokenization libraries and integrate with Hive Spark for column-level obfuscation - Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources. - Create and maintain integration and regression testing framework on Jenkins integrated with BitBucket and/or GIT repositories - Participate in the agile development process, and document and communicate issues and bugs relative to data standards in scrum meetings - Work collaboratively with onsite and offshore team. - Develop review technical documentation for artifacts delivered. - Ability to solve complex data-driven scenarios and triage towards defects and production issues - Ability to learn-unlearn-relearn concepts with an open and analytical mindset - Participate in code release and production deployment. - Challenge and inspire team members to achieve business results in a fast paced and quickly changing environment - BE/B.Tech/ B.Sc. in Computer Science/Statistics, Econometrics from an accredited college or university. - Minimum 3 years of extensive experience in design, build and deployment of PySpark-based applications. - Expertise in handling complex large-scale Big Data environments preferably (20Tb). - Minimum 3 years of experience in the following: HIVE, YARN, HDFS preferably on Hortonworks Data Platform. - Good implementation experience of OOPS concepts. - Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities. - Ability to build abstracted, modularized reusable code components. - Hands-on experience in generating/parsing XML, JSON documents, and REST API request/responses (ref:hirist.tech)

Location: in, IN

Posted Date: 10/9/2024
Click Here to Apply
View More Aricent Jobs

Contact Information

Contact Human Resources
Aricent

Posted

October 9, 2024
UID: 4890707696

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.