Celimpilo Promise Gumede

Data Engineer SQL Developer ETL Developer ML Engineer AI Engineer
Johannesburg, South Africa

About Me

Mathematics and Applied Mathematics graduate with specialized certifications in Data Engineering, Machine Learning, and Artificial Intelligence. Skilled in designing scalable ETL/ELT pipelines, data warehousing, and building ML-ready datasets. Experienced with cloud platforms (AWS, Azure) and modern data stack tools including Apache Spark, Airflow, and dbt. Passionate about solving complex data problems, optimizing query performance, and building robust data infrastructure that powers business intelligence and AI applications.

Currently available for opportunities in Data Engineering, SQL Development, and AI/ML Engineering.

Professional Experience

Data Engineering Intern Jan 2026 – Present
Capaciti • Johannesburg
  • Build and optimize ETL pipelines using Apache Spark and Airflow processing 1M+ records daily
  • Leverage AWS and Azure for scalable data storage and processing solutions
  • Conduct quantitative performance analysis of pipeline efficiency and implement optimizations
  • Collaborate on real-world data integration and data cleansing projects
Mathematics Tutor 2023 – 2024
University of KwaZulu-Natal • Durban
  • Tutored 200+ students in Calculus, Linear Algebra, and Statistics
  • Created Python-based visualizations to simplify quantitative concepts
  • Improved student pass rates by 25% through tailored data-driven teaching methods

Technical Expertise

Data Engineering & ETL

ETL/ELT Apache Spark Airflow dbt Kafka Data Warehousing

Databases & SQL

PostgreSQL MySQL MongoDB Snowflake Data Modeling Query Optimization

Programming

Python SQL Java PySpark FastAPI Git

Cloud & Infrastructure

AWS Azure Docker CI/CD

Machine Learning & AI

Regression Neural Networks NLP LangChain RAG Systems OpenAI APIs

Data Visualization

Power BI Tableau Matplotlib Seaborn

Featured Projects

Weather Data Pipeline

Apache Spark Airflow AWS Python

Automated ETL pipeline for real-time weather data ingestion, transformation, and dashboard visualization. Optimized for 50% faster query performance.

Smart City IoT Pipeline

Kafka Spark Azure Delta Lake

Real-time streaming analytics platform processing IoT sensor data with anomaly detection and alerting within 5 seconds of event occurrence.

Energy Consumption Pipeline

Python Time Series Power BI Forecasting

End-to-end data pipeline for energy consumption forecasting using statistical models and time-series analysis with interactive dashboards.

Bright DE AI Assistant

LangChain RAG OpenAI FastAPI

LLM-powered assistant for data engineering education with retrieval-augmented generation and multilingual search capabilities.

Multilingual Bible AI

NLP Translation Python JavaScript

AI-powered Bible study assistant with multilingual search, progress tracking, and natural language understanding across multiple translations.

ETL Quantitative Analysis

Spark Airflow Python Statistics

Performance measurement and optimization of ETL pipelines using statistical methods to analyze throughput, latency, and resource utilization.

Education & Certifications

Education

Bachelor of Science (Mathematics & Applied Mathematics) • University of KwaZulu-Natal
2022 – 2025
Dean's List 2023, 2024 Mathematics Department Award 2024 Medical Physics (UFS)

Professional Certifications

Data Engineering Capstone Project
IBM • 2026
Machine Learning with Apache Spark
IBM • 2026
Generative AI: Data Engineering Career
IBM • 2026
Advanced Data Modeling
Meta • 2026
Introduction to NoSQL Databases
IBM • 2026
ETL and Data Pipelines (Airflow/Kafka)
IBM • 2026
Data Warehouse Fundamentals
IBM • 2026
Introduction to Generative AI
Google Cloud • 2025

Download My Resume

Get my complete professional experience, technical skills, and project portfolio in PDF format.

Download CV (PDF)