A Data Enginner who loves building, breaking, and learning things π»
π About Me
-
π Iβm currently working on my Major Project: Cyber Security based Project for Automated Threat Detection using Docker and AI Workflow Automation
-
π± Iβm currently learning: Workflow Orchestration, Data Pipeling, Amazon Web Services
-
π‘ I love solving real-world problems with code
-
β‘ Fun fact: I enjoy debugging more than writing initial code π
Programming Languages:
SQL β’ NoSQL . Python β’ Java
Data and Machine Learning Libraries:
Pandas β’ Numpy β’ Scikit-learn
Visualization Tools:
Microsoft Power BI β’ Tableau. Seaborn . Plotly. Mathplotlib
Database:
MongoDB β’ MongoDB Compass . MySQL . MySQL Server
Tools:
Git β’ GitHub β’ Google Antigravity . Visual Studio Code β’ Apache Spark . Apache Airflow . Apache Kafka . Databricks
A expanded data analysis project for Amazon Prime Dataset from the year 1920-2021
Experienced with dataset of containing over 1,00,000 + records of large-data which include, different kinds of Movies and TV Shows.
The project also led towards various informative analysis of the Amazon prime data present in the dataset using various Aggregation Pipelines created in MongoDB Compass.
Deployed an Automated ETL Pipeline to process transactional sales process data, performed KPI Aggregation to stored optimized analytics dataset using Delta Lakes.
Used tools like Apache Airflow in order to work with workflow orchestration and storing processed data as Deltas.
The project focuses on analyzing company sales and revenue data using MongoDB. The system is designed to store, process, and analyze large volumes of business transaction data efficiently.
By using MongoDB aggregation pipelines, the project generates valuable business insights such as revenue trends, product performance and regional sales analysis with the use of visualization tool as Microsoft Power BI.
The primary goal of this project is to demonstrate how MongoDB can be used for real-world business analytics and decision-making processes.
The project demonstrates the use of Natural Language Processing (NLP) and Machine Learning (ML) Algorithms in a real-world problem like identification of a real and genuine review.
In order to do that, there are two key dimensions that are often target are authenticity (detecting fake or bot-generated reviews) and sentiment (determining if the review expresses positive, negative, or neutral emotions).
Both tasks involve distinct methods but can be integrated for a comprehensive review assessment.
Tools Used: Scikit-Learn, Hugging-Faced Tranformers Model, Pandas, Numpy, & More...
-
π± Started learning coding since 2020.
-
π» Build my first project during the initial years of the GitHub Journey.
-
π Contributed 400+ GitHub Commits and counting. Improving daily with different and meaningful information and insights.
-
GitHub: Sho670 (www.github.com/Sho670)
-
HackerRank: Shohom (www.hackerrank.com/Shohom2005)
"Consistency works with Discipline."
Improving my skills one commit at a time; Learning from Contribution one at a time!!!
