- Shinjan Saha
- Satyabrata Das Adhikari
- Sayan Sk
SignAI is a modern AI-powered Indian Sign Language (ISL) transcription platform that converts sign language gestures into readable human language using:
- π§ Graph Neural Networks (GCN)
- π Bidirectional LSTMs
- β MediaPipe skeletal landmark extraction
- π€ LLM-powered sentence reconstruction
The system supports:
βοΈ Real-time webcam-based sign detection
βοΈ Video upload inference
βοΈ Spatial skeletal understanding
βοΈ Temporal gesture modeling
βοΈ Human-readable sentence generation
βοΈ Premium UI-based interaction
| Feature | Description |
|---|---|
| π₯ Real-Time Detection | Webcam-based live ISL transcription |
| π Video Inference | Upload videos for offline prediction |
| π§ GCN + BiLSTM | Spatial + temporal deep learning pipeline |
| β MediaPipe Holistic | Hand & pose skeletal landmark extraction |
| π€ AI Sentence Generation | LLM reconstructs natural human sentences |
| π Confidence Visualization | Top-k prediction probabilities |
The system extracts:
- Left hand landmarks
- Right hand landmarks
- Pose landmarks
Each video becomes:
60 Frames Γ 258 Features
The pipeline includes:
- Relative normalization
- Wrist-origin scaling
- Shoulder-width normalization
- Temporal padding/truncation
- Augmentation
The skeletal graph consists of:
| Component | Joints |
|---|---|
| Left Hand | 21 |
| Right Hand | 21 |
| Pose | 33 |
| Total | 75 Nodes |
Edges model:
- Finger connections
- Palm structure
- Pose skeleton
- Cross-body wrist links
GCNs learn spatial relationships between joints.
The model extracts:
- Finger articulation patterns
- Relative body geometry
- Hand-shape semantics
- Spatial gesture understanding
Architecture:
GCN Layer 1 β 64 Features
GCN Layer 2 β 128 Features
BiLSTMs model temporal gesture dynamics.
Architecture:
BiLSTM (256 Units)
β
BiLSTM (128 Units)
This captures:
- Gesture motion flow
- Timing relationships
- Sequential dependencies
- Future + past context
Current sign language models mainly predict isolated words.
SignAI integrates:
Gemini API + LangChain
to convert predicted tokens into:
- grammatically coherent
- contextual
- readable human language
Example:
Predicted Words:
[I, go, market, tomorrow]
LLM Output:
I will go to the market tomorrow.
The training workflow includes:
βοΈ Frame-level augmentation
βοΈ Temporal augmentation
βοΈ Landmark normalization
βοΈ Landmark-level augmentation
βοΈ Class balancing
βοΈ Early stopping
βοΈ Learning rate scheduling
| Parameter | Value |
|---|---|
| Sequence Length | 60 |
| Input Features | 258 |
| Optimizer | Adam |
| Loss Function | Categorical Crossentropy |
| Epochs | 100 |
| Batch Size | 16 |
outputs/training_history.png
Tracks:
- Training accuracy
- Validation accuracy
- Training loss
- Validation loss
outputs/confusion_matrix.png
Displays:
- Class-level predictions
- Misclassification patterns
- Generalization performance
Frontend Webcam Stream
β
Flask Backend API
β
MediaPipe Landmark Extraction
β
GCN + BiLSTM Inference
β
Prediction Sequence
β
Gemini Sentence Reconstruction
β
Frontend Output
| Technology | Purpose |
|---|---|
| React + Vite | Frontend framework |
| Tailwind CSS | Styling |
| Framer Motion | Animations |
| Axios | API communication |
| Technology | Purpose |
|---|---|
| Flask | Backend API |
| TensorFlow / Keras | Deep learning |
| OpenCV | Video processing |
| MediaPipe | Landmark extraction |
| NumPy | Numerical computation |
| Technology | Purpose |
|---|---|
| GCN | Spatial skeletal learning |
| BiLSTM | Temporal sequence modeling |
| LangChain | LLM orchestration |
| Gemini API | Sentence generation |
SignAI/
β
βββ backend/ # Flask backend API + inference server
β βββ app.py # Main backend server and prediction routes
β βββ requirements.txt
β
βββ frontend/ # React + Vite frontend application
β βββ src/
β β βββ components/ # Reusable UI components
β β β
β β βββ pages/ # Main application pages
β β β βββ Home.jsx # Landing page
β β β βββ About.jsx # Project overview page
β β β βββ Detection.jsx # Real-time sign detection page
β β β βββ Signs.jsx # Supported sign classes page
β β β
β β βββ App.jsx # Main React application
β β βββ main.jsx # React entry point
β β
β βββ package.json # Frontend dependencies
β βββ vite.config.js
β βββ tailwind.config.js
β
βββ outputs/ # Training and evaluation outputs
β βββ confusion_matrix.png # Confusion matrix visualization
β βββ training_history.png # Accuracy/loss training curves
β βββ processed_landmarks.pkl # Preprocessed landmark dataset
β βββ best_isl_model.keras # Saved trained BiLSTM model
β βββ best_isl_gcn_model.keras # Best trained GCN model
β
βββ Indian Sign Language Greetings Dataset/ # Dataset directory
β
βββ ISL_transcription.ipynb # Complete training + research notebook
βββ live_detection.py # Webcam-based live detection script
βββ README.md # Project documentation
βββ requirements.txt # Root dependencies
git clone https://github.com/Code-r4Life/SignAI.git
cd SignAIcd backend
pip install -r requirements.txtCreate .env
GEMINI_API_KEY=your_api_key_hereRun backend:
python app.pycd frontend
npm install
npm run devpython live_detection.pyControls:
SPACE β Start / Stop Recording
Q β Quit Application
The following graph shows:
- Training Accuracy
- Validation Accuracy
- Training Loss
- Validation Loss
This helps analyze:
- model convergence
- overfitting
- learning stability
The confusion matrix visualizes:
- class-wise predictions
- model confusion patterns
- classification performance
I build smart, ML-integrated applications and responsive web platforms. Letβs build something powerful together!
π LinkedIn Profile

