🎓 Placement Probability Predictor

Predicts the probability of a college student getting placed based on academic and skill parameters using a custom-built Gaussian Naive Bayes classifier with PCA, served via FastAPI and Streamlit.

📌 Overview

This project takes 5 student input parameters and returns the probability of placement using a fully custom ML pipeline — no sklearn used. The model is built from scratch using NumPy, SciPy, and PyTorch-based Maximum Likelihood Estimation (MLE).

🧠 ML Pipeline

Raw Input → PCA (SVD) → PyTorch Gaussian MLE → Naive Bayes → Posterior Probability

Data Preprocessing — Categorical columns (Yes/No) mapped to binary (1/0). Student ID column cleaned and converted to integer
Feature Selection — 5 numeric features selected based on correlation analysis: IQ, Previous Semester Result, CGPA, Communication Skills, Projects Completed
PCA — Manual implementation using Singular Value Decomposition (SVD) to decorrelate features and reduce multicollinearity. Eigenvectors saved as eigen_vectors.npy
Gaussian MLE via PyTorch — Custom neural network (GaussianMLEstimatorNN) built in PyTorch that minimizes Negative Log-Likelihood (NLL) using SGD optimizer to estimate μ (mean) and σ (std) per feature per class. This is the gradient descent approach to MLE — mathematically equivalent to the closed-form solution but demonstrates deep understanding of optimization
Gaussian Naive Bayes — Bayes' theorem used to compute the posterior probability of placement with a normalizing constant for calibrated true probabilities

🔬 How the Math Works

PCA: Covariance matrix → SVD → Eigenvectors project input into decorrelated space

Gaussian NLL Loss (PyTorch MLE):

L(μ, σ) = -mean( log( (1/√(2πσ²)) × exp(-(x-μ)²/2σ²) ) )

Bayes' Theorem:

P(placed | features) = P(features | placed) × P(placed) / P(features)

Normalizing Constant:

P(features) = P(features|placed=1) × P(placed=1) 
            + P(features|placed=0) × P(placed=0)

Gaussian Likelihood per feature:

P(xᵢ | class) = (1 / √(2πσ²)) × exp(-(xᵢ - μ)² / 2σ²)

🛠️ Tech Stack

Layer	Technology
ML Model	NumPy, SciPy, Pandas, PyTorch
Backend	FastAPI, Uvicorn, Pydantic
Frontend	Streamlit
Notebook	Jupyter
Environment	Python 3.12, Ubuntu 24.04 (WSL)

📂 Project Structure

placement-probability-predictor/
├── ds-project-1.ipynb                  # ML pipeline notebook
├── backend.py                          # FastAPI REST API
├── streamlit_app.py                    # Streamlit frontend
├── config.py                           # Prior probability and feature constants
├── eigen_vectors.npy                   # Saved PCA eigenvectors
├── likelihood_distribution_params.pkl  # Saved Gaussian MLE parameters
├── college_student_placement_dataset.csv  # Dataset
├── requirements.txt                    # Python dependencies
└── README.md

⚙️ How to Run

1. Clone the repo

git clone https://github.com/Divyanshusinghrajawat/placement-probability-predictor.git
cd placement-probability-predictor

2. Create virtual environment

python3 -m venv .venv
source .venv/bin/activate

3. Install dependencies

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

4. Run the backend (Terminal 1)

uvicorn backend:app --reload

5. Run the frontend (Terminal 2)

streamlit run streamlit_app.py

6. Open in browser

http://localhost:8501

📊 Dataset

Source: College Student Placement Dataset
Rows: Custom dataset
Features: IQ, Previous Semester Result, CGPA, Communication Skills, Projects Completed, Placement Status
Target: Placed (0 or 1)
Prior P(placed=1): 0.1659 (~16.6% placement rate)

🔍 Input Features

Feature	Description	Range
IQ	Intelligence Quotient score	40 – 160
Previous Semester Result	Last semester score out of 10	0.0 – 10.0
CGPA	Cumulative Grade Point Average	0.0 – 10.0
Communication Skills	Self-rated communication score	0 – 10
Projects Completed	Number of projects done	0 – 5

🏗️ Architecture

┌─────────────────┐         HTTP POST          ┌──────────────────────┐
│                 │   /compute_probability      │                      │
│  Streamlit UI   │ ─────────────────────────► │   FastAPI Backend    │
│  (port 8501)    │ ◄───────────────────────── │   (port 8000)        │
│                 │      JSON response          │                      │
└─────────────────┘                            └──────────────────────┘
                                                          │
                                               ┌──────────▼───────────┐
                                               │   ML Inference       │
                                               │  eigen_vectors.npy   │
                                               │  params.pkl          │
                                               └──────────────────────┘

💡 Key Design Decisions

PyTorch for MLE — Used gradient descent (SGD + NLL loss) to estimate Gaussian parameters instead of the closed-form solution, to demonstrate understanding of optimization and loss minimization
PCA from scratch — Used NumPy SVD directly instead of sklearn's PCA to show understanding of the underlying linear algebra
Normalized posterior — Backend computes true posterior probabilities using the normalizing constant, not just unnormalized likelihoods
Client-server architecture — Streamlit frontend and FastAPI backend run as separate services communicating via REST API, mirroring production ML deployment patterns

👨‍💻 Author

Divyanshu Singh Rajawat B.Tech Computer Science | Poornima University GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 Placement Probability Predictor

📌 Overview

🧠 ML Pipeline

🔬 How the Math Works

🛠️ Tech Stack

📂 Project Structure

⚙️ How to Run

📊 Dataset

🔍 Input Features

🏗️ Architecture

💡 Key Design Decisions

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
backend.py		backend.py
college_student_placement_dataset.csv		college_student_placement_dataset.csv
config.py		config.py
ds-project-1.ipynb		ds-project-1.ipynb
eigen_vectors.npy		eigen_vectors.npy
likelihood_distribution_params.pkl		likelihood_distribution_params.pkl
reuirement.txt		reuirement.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

🎓 Placement Probability Predictor

📌 Overview

🧠 ML Pipeline

🔬 How the Math Works

🛠️ Tech Stack

📂 Project Structure

⚙️ How to Run

📊 Dataset

🔍 Input Features

🏗️ Architecture

💡 Key Design Decisions

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages