Skip to content

Code-r4Life/SignAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

βœ‹ SignAI β€” AI-Powered Indian Sign Language Transcription System

Python TensorFlow React Flask MediaPipe GCN

AI-Powered Indian Sign Language Recognition using MediaPipe, GCNs, BiLSTMs & LLMs


πŸ‘¨β€πŸ’» Team Byte_Coders

Team Lead

  • Shinjan Saha

Members

  • Satyabrata Das Adhikari
  • Sayan Sk

πŸ“Œ Project Overview

SignAI is a modern AI-powered Indian Sign Language (ISL) transcription platform that converts sign language gestures into readable human language using:

  • 🧠 Graph Neural Networks (GCN)
  • πŸ”„ Bidirectional LSTMs
  • βœ‹ MediaPipe skeletal landmark extraction
  • πŸ€– LLM-powered sentence reconstruction

The system supports:

βœ”οΈ Real-time webcam-based sign detection

βœ”οΈ Video upload inference

βœ”οΈ Spatial skeletal understanding

βœ”οΈ Temporal gesture modeling

βœ”οΈ Human-readable sentence generation

βœ”οΈ Premium UI-based interaction


⚑ Key Features

Feature Description
πŸŽ₯ Real-Time Detection Webcam-based live ISL transcription
πŸ“ Video Inference Upload videos for offline prediction
🧠 GCN + BiLSTM Spatial + temporal deep learning pipeline
βœ‹ MediaPipe Holistic Hand & pose skeletal landmark extraction
πŸ€– AI Sentence Generation LLM reconstructs natural human sentences
πŸ“Š Confidence Visualization Top-k prediction probabilities

πŸ—οΈ AI Architecture

1️⃣ MediaPipe Landmark Extraction

The system extracts:

  • Left hand landmarks
  • Right hand landmarks
  • Pose landmarks

Each video becomes:

60 Frames Γ— 258 Features

The pipeline includes:

  • Relative normalization
  • Wrist-origin scaling
  • Shoulder-width normalization
  • Temporal padding/truncation
  • Augmentation

2️⃣ Graph Construction

The skeletal graph consists of:

Component Joints
Left Hand 21
Right Hand 21
Pose 33
Total 75 Nodes

Edges model:

  • Finger connections
  • Palm structure
  • Pose skeleton
  • Cross-body wrist links

3️⃣ Graph Convolutional Network (GCN)

GCNs learn spatial relationships between joints.

The model extracts:

  • Finger articulation patterns
  • Relative body geometry
  • Hand-shape semantics
  • Spatial gesture understanding

Architecture:

GCN Layer 1 β†’ 64 Features
GCN Layer 2 β†’ 128 Features

4️⃣ Bidirectional LSTM

BiLSTMs model temporal gesture dynamics.

Architecture:

BiLSTM (256 Units)
        ↓
BiLSTM (128 Units)

This captures:

  • Gesture motion flow
  • Timing relationships
  • Sequential dependencies
  • Future + past context

5️⃣ AI Sentence Reconstruction

Current sign language models mainly predict isolated words.

SignAI integrates:

Gemini API + LangChain

to convert predicted tokens into:

  • grammatically coherent
  • contextual
  • readable human language

Example:

Predicted Words:
[I, go, market, tomorrow]

LLM Output:
I will go to the market tomorrow.

πŸ“Š Training Pipeline

The training workflow includes:

βœ”οΈ Frame-level augmentation

βœ”οΈ Temporal augmentation

βœ”οΈ Landmark normalization

βœ”οΈ Landmark-level augmentation

βœ”οΈ Class balancing

βœ”οΈ Early stopping

βœ”οΈ Learning rate scheduling


βš™οΈ Training Configuration

Parameter Value
Sequence Length 60
Input Features 258
Optimizer Adam
Loss Function Categorical Crossentropy
Epochs 100
Batch Size 16

πŸ“ˆ Output Visualizations

πŸ”₯ Training History

outputs/training_history.png

Tracks:

  • Training accuracy
  • Validation accuracy
  • Training loss
  • Validation loss

πŸ“Š Confusion Matrix

outputs/confusion_matrix.png

Displays:

  • Class-level predictions
  • Misclassification patterns
  • Generalization performance

🌐 Full Stack Architecture

Frontend Webcam Stream
        ↓
Flask Backend API
        ↓
MediaPipe Landmark Extraction
        ↓
GCN + BiLSTM Inference
        ↓
Prediction Sequence
        ↓
Gemini Sentence Reconstruction
        ↓
Frontend Output

πŸ› οΈ Tech Stack

Frontend

Technology Purpose
React + Vite Frontend framework
Tailwind CSS Styling
Framer Motion Animations
Axios API communication

Backend

Technology Purpose
Flask Backend API
TensorFlow / Keras Deep learning
OpenCV Video processing
MediaPipe Landmark extraction
NumPy Numerical computation

AI / ML

Technology Purpose
GCN Spatial skeletal learning
BiLSTM Temporal sequence modeling
LangChain LLM orchestration
Gemini API Sentence generation

πŸ“ Repository Structure

SignAI/
β”‚
β”œβ”€β”€ backend/                                # Flask backend API + inference server
β”‚   β”œβ”€β”€ app.py                              # Main backend server and prediction routes
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ frontend/                               # React + Vite frontend application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/                     # Reusable UI components
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ pages/                          # Main application pages
β”‚   β”‚   β”‚   β”œβ”€β”€ Home.jsx                    # Landing page
β”‚   β”‚   β”‚   β”œβ”€β”€ About.jsx                   # Project overview page
β”‚   β”‚   β”‚   β”œβ”€β”€ Detection.jsx               # Real-time sign detection page
β”‚   β”‚   β”‚   └── Signs.jsx                   # Supported sign classes page
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ App.jsx                         # Main React application
β”‚   β”‚   └── main.jsx                        # React entry point
β”‚   β”‚
β”‚   β”œβ”€β”€ package.json                        # Frontend dependencies
β”‚   β”œβ”€β”€ vite.config.js
β”‚   └── tailwind.config.js
β”‚
β”œβ”€β”€ outputs/                                # Training and evaluation outputs
β”‚   β”œβ”€β”€ confusion_matrix.png                # Confusion matrix visualization
β”‚   β”œβ”€β”€ training_history.png                # Accuracy/loss training curves
β”‚   β”œβ”€β”€ processed_landmarks.pkl             # Preprocessed landmark dataset
β”‚   β”œβ”€β”€ best_isl_model.keras                # Saved trained BiLSTM model
β”‚   └── best_isl_gcn_model.keras            # Best trained GCN model
β”‚
β”œβ”€β”€ Indian Sign Language Greetings Dataset/ # Dataset directory
β”‚
β”œβ”€β”€ ISL_transcription.ipynb                 # Complete training + research notebook
β”œβ”€β”€ live_detection.py                       # Webcam-based live detection script
β”œβ”€β”€ README.md                               # Project documentation
└── requirements.txt                        # Root dependencies

πŸš€ Local Setup

1️⃣ Clone Repository

git clone https://github.com/Code-r4Life/SignAI.git
cd SignAI

βš™οΈ Backend Setup

cd backend
pip install -r requirements.txt

Create .env

GEMINI_API_KEY=your_api_key_here

Run backend:

python app.py

🌐 Frontend Setup

cd frontend
npm install
npm run dev

πŸŽ₯ Run Live Detection

python live_detection.py

Controls:

SPACE β†’ Start / Stop Recording
Q β†’ Quit Application

πŸ–ΌοΈ Training & Evaluation Visualizations

πŸ“Š Training History

The following graph shows:

  • Training Accuracy
  • Validation Accuracy
  • Training Loss
  • Validation Loss

This helps analyze:

  • model convergence
  • overfitting
  • learning stability
Training History

πŸ”₯ Confusion Matrix

The confusion matrix visualizes:

  • class-wise predictions
  • model confusion patterns
  • classification performance
Confusion Matrix

πŸ“¬ Interested in a Similar Project?

I build smart, ML-integrated applications and responsive web platforms. Let’s build something powerful together!

πŸ“§ shinjansaha00@gmail.com

πŸ”— LinkedIn Profile

About

Subtitles for signs (didn't mean hints though... shhh πŸ‘€)

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors