Analytics Engineer with 1.5+ years of experience building scalable ELT pipelines, real-time streaming architectures, and ML feature stores. Currently at GlobalLogic, engineering ETL pipelines that process diverse datasets (JSON, PDFs, images, Parquet) from client APIs into Google Knowledge Graph.
I'm passionate about building reliable, well-documented data infrastructure that supports ML model training and self-serve analytics at scale.
-
⚡ Current Project: Crypto Real-Time Data Platform — Real-time crypto streaming pipeline: CoinGecko → Kafka (AWS MSK) → Snowflake → dbt → Great Expectations · Orchestrated by Airflow
-
⏮ Previous Project: EURO 2024 Real-Time Data Warehouse Streaming — Apache Kafka + Airflow + Apache Pinot + Superset · 40% efficiency improvement
-
🧠 ML Pipeline: AI Feature Store & ML Training Pipeline — Airflow + dbt + Snowflake + AWS S3 · Reduced ML data prep time by 70%
-
📊 Analytics Platform: Self-Service Analytics Platform — dbt + Snowflake + Apache Superset · 4 business domain dashboards
-
⏩ SQL Practice: 50 SQL Leetcode Problems
- Crypto Real-Time Data Platform — Production-grade streaming pipeline on AWS (MSK Kafka + Lambda + Snowflake + dbt + Great Expectations + Airflow)
- dbt Fundamentals Certification — dbt Labs
- AWS Cloud Practitioner — Certification in progress
- 🌐 Portfolio: datascienceportfol.io/evansajumathew
- 📄 Resume: Google Drive

