Got it β hereβs a clean, professional GitHub README.md you can directly drop into your repo for this project π
(Itβs written to match CS 6375 Project 3 and your notebooks: MLP.ipynb, cnn_final.ipynb.)
# Deep Learning on MNIST and CIFAR-10 (MLP & CNN)
This repository contains my implementation and evaluation of **Multilayer Perceptrons (MLPs)** and **Convolutional Neural Networks (CNNs)** on the **MNIST** and **CIFAR-10** datasets using **PyTorch**.
This project was completed as **Project 3 for CS 6375: Machine Learning**.
## π Project Objectives
- Implement MLP and CNN models from scratch using PyTorch
- Explore **multiple architectures** with increasing depth
- Perform **validation-based hyperparameter tuning**
- Compare the performance of **MLPs vs CNNs**
- Evaluate final models on held-out **test sets**
The full project specification is provided in the assignment PDF :contentReference[oaicite:0]{index=0}.
---
## π Repository Structure. βββ MLP.ipynb # Multilayer Perceptron experiments (MNIST & CIFAR-10) βββ cnn_final.ipynb # Convolutional Neural Network experiments βββ project_3.pdf # Official assignment description βββ README.md
---
## π§ Models Implemented
### 1οΈβ£ Multilayer Perceptrons (MLP)
MLPs were trained on **flattened images** from both datasets.
Architectures explored:
- **Shallow MLP**: 1 hidden layer (e.g., 128 units)
- **Medium MLP**: 3 hidden layers (e.g., 512 β 256 β 128)
- **Deep MLP**: 5+ hidden layers
### 2οΈβ£ Convolutional Neural Networks (CNN)
CNNs were trained directly on image tensors.
Architectures explored:
- **Baseline CNN**: 2 convolutional layers + pooling + FC
- **Enhanced CNN**: Batch Normalization + Dropout
- **Deeper CNN**: 3+ convolutional layers with normalization and dropout
---
## βοΈ Datasets & Preprocessing
- **MNIST** (28Γ28 grayscale images)
- **CIFAR-10** (32Γ32 RGB images)
Preprocessing steps:
- Loaded using `torchvision.datasets`
- Pixel normalization applied
- Dataset split into **training / validation / test**
- MNIST: 50k train / 10k validation
- CIFAR-10: 45k train / 5k validation
---
## π Hyperparameter Tuning
For each architecture and dataset, multiple configurations were tested using a validation set.
Tuned parameters:
- Learning rate: `{0.01, 0.001, 0.0001}`
- Batch size: `{32, 64, 128}`
- Optimizer: `SGD`, `Adam`
- Dropout rate: `{0.2, 0.5}`
Instead of exhaustive search, **10β12 meaningful configurations** were explored per architecture.
The best model was selected based on **validation accuracy**, then retrained on combined training + validation data.
---
## π Results
Results are reported in the same format as required by the assignment:
- Validation Accuracy (Β± standard deviation)
- Runtime (minutes)
- Final Test Accuracy
Separate result tables were produced for:
- **MNIST**
- **CIFAR-10**
Key observation:
- CNNs significantly outperform MLPs, especially on **CIFAR-10**, due to their ability to capture spatial features.
---
## π How to Run
### Requirements
```bash
pip install torch torchvision matplotlib numpy
jupyter notebook MLP.ipynbjupyter notebook cnn_final.ipynbNote: GPU (Google Colab recommended) significantly reduces training time.
- Deeper models do not always guarantee better performance without proper regularization
- CNNs are far more effective than MLPs for image-based tasks
- Batch normalization and dropout improve stability and generalization
- Validation-based tuning is critical for fair model comparison
- PyTorch Documentation: https://pytorch.org/docs/stable/
- MNIST Dataset
- CIFAR-10 Dataset