Description:
Minimum Requirements:Bachelors degree in Computer Science or Engineering (or similar) AWS Certified Data Engineer or AWS Certified Solutions Architect or AWS Certified Data Analyst 5+ years experience in a similar role Strong skills in Python (especially PySpark for AWS Glue) Strong knowledge of data modeling, schema design and database optimization Proficiency with AWS and infrastructure as code Knowledge of SQL, Python, AWS serverless microservices, Deploying and managing ML models in production Version control (Git), unit testing and agile methodologies
Required Experience:
Data Engineering development Experience with AWS services used for data warehousing, computing and transformations ie.AWS Glue (crawlers, jobs, triggers, and catalog), AWS S3, AWS Lambda, AWS Step Functions, AWS Athena and AWS CloudWatch Experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, DynamoDB) Experience with SQL for querying and transformation of data
Key Accountabilities:
Automate data workflows and ensure they are fault-tolerant and optimized Implement logging, monitoring, and alerting for data pipelines Optimize ETL job performance by tuning configurations and analyzing resource usage Optimize data storage solutions for performance, cost and scalability Ensure the optimisation of AWS resources for scalability for data ingestion and outputs Deploy machine learning models into productions using cloud-based services like AWS Sagemaker Design, develop and optimize scalable ETL pipelines using batch and real-time processing frameworks (using AWS Glue and PySpark) Implement data extraction, transformation and loading processes from various structured and unstructured sources. Optimize ETL jobs for performance, cost efficiency and scalability Develop and integrate APIs to ingest and export data between various source and target systems, ensuring seamless ETL workflows Enable scalable deployment of ML models by integrating data pipelines with ML workflows Design and maintain scalable data architectures using AWS services for example, but not limited to, AWS S3, AWS Glue and AWS Athena Implement data partitioning and cataloging strategies to enhance data organization and accessibility Work with schema evolution and versioning to ensure data consistency Develop and manage metadata repositories and data dictionaries Assisted and supported the definition, setup, and maintenance of data access roles and privileges
24 Mar 2025;
from:
gumtree.co.za