Skip to content
View evanmathew's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report evanmathew

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
evanmathew/README.md

evanmathew

🌐 Connect with me:

evansajumathew evansajumathew evansajumathew


💫 About Me

Analytics Engineer with 1.5+ years of experience building scalable ELT pipelines, real-time streaming architectures, and ML feature stores. Currently at GlobalLogic, engineering ETL pipelines that process diverse datasets (JSON, PDFs, images, Parquet) from client APIs into Google Knowledge Graph.

I'm passionate about building reliable, well-documented data infrastructure that supports ML model training and self-serve analytics at scale.


🔗 Explore My Work


🌱 What I'm Currently Building

  • Crypto Real-Time Data Platform — Production-grade streaming pipeline on AWS (MSK Kafka + Lambda + Snowflake + dbt + Great Expectations + Airflow)
  • dbt Fundamentals Certification — dbt Labs
  • AWS Cloud Practitioner — Certification in progress

💼 Portfolio & Resources


💻 Tech Stack

Data Engineering & Streaming

Apache Kafka Apache Airflow Apache Spark dbt

Databases & Warehouses

Snowflake Postgres MySQL Amazon Redshift

Cloud Platforms

AWS AWS Lambda Amazon MSK

Programming Languages

Python Bash Script SQL

Data Quality & Observability

Great Expectations dbt Tests

Visualization

Power BI Apache Superset

DevOps & Version Control

Docker Git GitHub GitHub Actions Linux

Pinned Loading

  1. ETL-University-Course-Extraction-Using-Spark-Snowflake ETL-University-Course-Extraction-Using-Spark-Snowflake Public

    This project automates the extraction of university course details (e.g., schedules, professors, course codes) from text files using Regex pattern and SpaCy NLP Model and , processes them using PyS…

    Python

  2. euro-2024-kafka-pinot-pipeline euro-2024-kafka-pinot-pipeline Public

    This project implements a real-time data pipeline for EURO 2024 football data, utilizing Apache Kafka for streaming, Apache Pinot for fast querying, and Apache Superset for data visualization. The …

    Python

  3. Reddit_ETL_DE Reddit_ETL_DE Public

    This project demonstrates a complete data pipeline for extracting, transforming, and loading (ETL) Reddit data into an Amazon Redshift data warehouse. The pipeline uses various AWS services and too…

    Python 1

  4. Data-Analysis-Projects Data-Analysis-Projects Public

    This repository hosts multiple data analysis projects, showcasing a variety of real-time and batch processing pipelines. Each project highlights different tools and technologies, offering comprehen…

    Jupyter Notebook 1

  5. evanmathew evanmathew Public

    1

  6. netflix_sql_data_analysis netflix_sql_data_analysis Public

    This project explores the Netflix dataset using SQL to answer complex analytical questions. It involves data cleansing, aggregation, ranking, and advanced SQL techniques to uncover insights such as…