Description:
Job Responsibilities:
Design, develop, and maintain scalable data pipelines and ETL processes on Google Cloud Platform (GCP). Implement and optimize data storage solutions using BigQuery, Cloud Storage, and other GCP services. Collaborate with data scientists, machine learning engineers, data engineers, and other stakeholders to integrate and deploy machine learning models into production environments. Develop and maintain custom deployment solutions for machine learning models using tools such as Kubeflow, AI Platform, and Docker. Write clean, efficient, and maintainable code in Python and PySpark for data processing and transformation tasks. Ensure data quality, integrity, and consistency through data validation and monitoring processes. Deep understanding of Medallion architecture. Develop metadata-driven pipelines and ensure optimal processing of data. Use Terraform to manage and provision cloud infrastructure resources on GCP. Troubleshoot and resolve production issues related to data pipelines and machine learning models. Stay up-to-date with the latest industry trends and best practices in data engineering, machine learning, and cloud technologies, including data lifecycle management, data pruning, model drift, and model optimizations.Qualifications:
Bachelors or Masters degree in Computer Science, Engineering, or a related field.Skills & Experience:
Proven experience as a Data Engineer with a focus on Google Cloud Platform (GCP). Strong proficiency in Python and PySpark for data processing and transformation. Hands-on experience with machine learning model deployment and integration on GCP. Familiarity with GCP services such as BigQuery, Cloud Storage, Dataflow, and AI Platform. Experience with Terraform for infrastructure as code. Experience with containerization and orchestration tools like Docker and Kubernetes. Strong problem-solving skills with the ability to troubleshoot complex issues. Excellent communication and collaboration skills
26 Mar 2025;
from:
gumtree.co.za