AI-powered document reconciliation platform for cotton procurement in textile operations.
TexFinOps reduces manual effort in matching purchase data across invoices, weighbridge slips, and inward records. It extracts data from uploaded files, validates business rules, computes shortages/debit quantities, and exposes a workflow-friendly dashboard for review.
In cotton procurement, teams usually compare:
- supplier invoice weight,
- actual received weight,
- commercial rate (candy rate),
- allowable process loss.
This is often done manually in spreadsheets, which is slow and error-prone.
TexFinOps automates that process by combining:
- Document upload (invoice/weighbridge),
- OCR extraction of text,
- LLM + regex parsing into structured fields,
- Reconciliation logic to calculate shortage/debit,
- UI review for approval and correction.
- Faster turnaround: reduces the time from document receipt to reconciliation.
- Lower human error: systematic parsing and deterministic calculations.
- Auditability: stores source documents and parsed fields in one place.
- Scalability: async worker model supports growing document volume.
- Finance control: highlights shortages and debit-note candidates quickly.
- FastAPI: high-performance API framework with automatic OpenAPI docs (
/docs). - SQLModel + SQLAlchemy (async): typed models and async DB operations.
- PostgreSQL: reliable relational persistence for transactional records.
- Celery: background processing for OCR/parsing tasks.
- Redis: message broker/result backend for Celery.
- PaddleOCR + pdf2image + Pillow: OCR pipeline for PDFs/images.
- LangChain + provider LLM APIs: semantic extraction from noisy document text.
- Pydantic / pydantic-settings: config + schema validation.
- React + TypeScript: typed component-based UI.
- Vite: fast local development and build tooling.
- React Router: page routing.
- TanStack Query: server-state fetching/caching.
- React Hook Form + Zod: form handling and validation.
- TanStack Table: data grid/table interactions.
- Tailwind CSS + Radix UI: fast, accessible UI composition.
- react-pdf / react-dropzone: PDF preview + document upload UX.
- Docker + Docker Compose: reproducible multi-service environment.
- Service split (web/worker/db/redis/frontend): clear separation of concerns.
flowchart LR
U[User] --> FE[React Frontend]
FE -->|HTTP REST| API[FastAPI Backend]
API --> DB[(PostgreSQL)]
API --> FS[(Uploads Storage)]
API -->|Queue task| R[(Redis)]
R --> W[Celery Worker]
W --> OCR[OCR Service]
W --> LLM[LangChain Parser + LLM]
W --> DB
W --> FS
FE -->|poll status/read data| API
- OCR/LLM tasks are variable-latency and CPU-intensive; keeping them async prevents API blocking.
- API stays responsive while worker processes documents in background.
- Postgres gives consistent state for records and reconciliation outputs.
- Redis decouples ingestion from processing for reliability and scale.
- User creates a cotton inward record from dashboard.
- User uploads invoice and/or weighbridge documents.
- Backend validates type/size and stores file under uploads.
- Backend creates
Documentrow and queues Celery task. - Worker picks task and runs OCR extraction.
- Parser converts OCR text to structured fields (LLM first, regex fallback).
- Parsed values update
Documentand linkedCottonInward. - Reconciliation service calculates shortage/debit metrics.
- Inward status updates (e.g., processing/reconciled/debit required).
- Frontend displays results for review and action.
backend/→ FastAPI, models, schemas, OCR/parser/reconciliation services, Celery worker.frontend/→ React dashboard and review UI.uploads/→ stored uploaded files (invoice/weighbridge).docker-compose.yml→ local orchestration for all services..env.example→ environment template.
Base URL: http://localhost:8000/api/v1
GET /cotton-inwards→ list with pagination/filter/searchPOST /cotton-inwards→ create recordGET /cotton-inwards/{id}→ detailsPATCH /cotton-inwards/{id}→ update recordDELETE /cotton-inwards/{id}→ deletePOST /cotton-inwards/{id}/reconcile→ manual reconcile
POST /documents/upload→ upload invoice/weighbridgeGET /documents/inward/{cotton_inward_id}→ list inward documentsGET /documents/{document_id}→ document metadataGET /documents/{document_id}/status→ OCR processing statusGET /documents/{document_id}/download→ download fileDELETE /documents/{document_id}→ remove document/file
Health/docs:
GET /healthGET /docs
- Go to project root:
cd /home/hariswar/Documents/TextFinOps
- Ensure
.envexists (copy from.env.exampleif needed). - Start services:
docker compose up -d --build
- Open:
- Frontend:
http://localhost:5173 - API docs:
http://localhost:8000/docs
- Frontend:
- Open dashboard and create an inward entry.
- Upload invoice and weighbridge files.
- Wait for processing status to complete.
- Open inward detail page and review extracted values.
- Verify reconciliation output (shortage, allowable loss, debit qty/amount).
- Update/override received weight if needed and trigger reconcile.
- Download source documents if required for audit.
docker compose down
docker compose logs --tail=200 webdocker compose logs --tail=200 workerdocker compose logs --tail=200 frontend
Important environment variables:
DATABASE_URL→ async Postgres URL used by backend/worker.REDIS_URL/CELERY_BROKER_URL/CELERY_RESULT_BACKEND→ queue infra.UPLOAD_DIR/MAX_UPLOAD_SIZE_MB→ document storage rules.GOOGLE_API_KEY/OPENAI_API_KEY/ANTHROPIC_API_KEY→ optional LLM provider key.
Use only one provider key unless your parser logic explicitly supports fallback order.
- Do not commit real API keys in
.env. - Prefer secrets management in production.
- Restrict CORS origins for non-dev environments.
- Add authentication/authorization before production usage.
- Add rate limits and file scanning for upload hardening.
- Limited document templates may need parser tuning.
- OCR quality depends on scan quality.
- No auth yet (suitable for controlled internal environments only).
- Consider adding:
- role-based access control,
- retry/monitoring dashboards for Celery,
- model/parse confidence review queue,
- automated test coverage for parsing/reconciliation.
After startup, verify:
http://localhost:8000/healthreturns healthy JSON.http://localhost:8000/docsloads API docs.http://localhost:5173loads dashboard.- Uploading one sample file creates a document and processing status endpoint responds.
If all pass, TexFinOps is running correctly.