Description:
Key Responsibilities:- Architect and build highly scalable distributed systems using open-source tools.
- Design and implement efficient data models, with a deep understanding of various data structures and their benefits and limitations.
- Work with big data batch and streaming tools to process and analyze data at scale.
- Develop and optimize Extract, Transform, Load (ETL) processes for data pipelines.
- Utilize AWS technologies (EMR, EC2, and S3) for data storage and processing.
- Write and maintain clean, efficient code in Python, PySpark, or Spark.
- Collaborate with cross-functional teams to define data engineering requirements and deliver technical solutions.
- Ensure the smooth operation, performance, and scalability of data infrastructures.
- Continuously improve the reliability and security of data systems.
Requirements:
- Bachelors Degree in Computer Science, Computer Engineering, or a related field, or equivalent experience.
- AWS Certification (e.g., AWS Certified Solutions Architect, and AWS Certified Data Analytics).
- Extensive knowledge of programming or scripting languages (Python, Spark, etc.).
- Expert knowledge of data modelling and strong understanding of various data structures.
- Proven ability to architect highly scalable distributed systems using open-source tools.
- 5+ years of experience in data engineering or software engineering.
- 2+ years of experience with big data technologies.
- 2+ years of experience with ETL processes.
- 2+ years of experience working with AWS services (EMR, EC2, and S3).
- 5+ years of experience with object-oriented design, coding, and testing patterns.
- Hands-on experience with Talend for data integration.
- Strong experience with big data batch and streaming tools.
- Solid background in developing commercial or open-source software platforms and large-scale data infrastructure.
Apply now!
28 Mar 2025;
from:
gumtree.co.za