Skip to content

CoderHariswar/TextFinOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TexFinOps

AI-powered document reconciliation platform for cotton procurement in textile operations.

TexFinOps reduces manual effort in matching purchase data across invoices, weighbridge slips, and inward records. It extracts data from uploaded files, validates business rules, computes shortages/debit quantities, and exposes a workflow-friendly dashboard for review.


1) What this project is about

In cotton procurement, teams usually compare:

  • supplier invoice weight,
  • actual received weight,
  • commercial rate (candy rate),
  • allowable process loss.

This is often done manually in spreadsheets, which is slow and error-prone.

TexFinOps automates that process by combining:

  1. Document upload (invoice/weighbridge),
  2. OCR extraction of text,
  3. LLM + regex parsing into structured fields,
  4. Reconciliation logic to calculate shortage/debit,
  5. UI review for approval and correction.

2) Why this project is significant

  • Faster turnaround: reduces the time from document receipt to reconciliation.
  • Lower human error: systematic parsing and deterministic calculations.
  • Auditability: stores source documents and parsed fields in one place.
  • Scalability: async worker model supports growing document volume.
  • Finance control: highlights shortages and debit-note candidates quickly.

3) Technology stack and why each is used

Backend

  • FastAPI: high-performance API framework with automatic OpenAPI docs (/docs).
  • SQLModel + SQLAlchemy (async): typed models and async DB operations.
  • PostgreSQL: reliable relational persistence for transactional records.
  • Celery: background processing for OCR/parsing tasks.
  • Redis: message broker/result backend for Celery.
  • PaddleOCR + pdf2image + Pillow: OCR pipeline for PDFs/images.
  • LangChain + provider LLM APIs: semantic extraction from noisy document text.
  • Pydantic / pydantic-settings: config + schema validation.

Frontend

  • React + TypeScript: typed component-based UI.
  • Vite: fast local development and build tooling.
  • React Router: page routing.
  • TanStack Query: server-state fetching/caching.
  • React Hook Form + Zod: form handling and validation.
  • TanStack Table: data grid/table interactions.
  • Tailwind CSS + Radix UI: fast, accessible UI composition.
  • react-pdf / react-dropzone: PDF preview + document upload UX.

DevOps / Runtime

  • Docker + Docker Compose: reproducible multi-service environment.
  • Service split (web/worker/db/redis/frontend): clear separation of concerns.

4) High-level architecture

flowchart LR
  U[User] --> FE[React Frontend]
  FE -->|HTTP REST| API[FastAPI Backend]
  API --> DB[(PostgreSQL)]
  API --> FS[(Uploads Storage)]
  API -->|Queue task| R[(Redis)]
  R --> W[Celery Worker]
  W --> OCR[OCR Service]
  W --> LLM[LangChain Parser + LLM]
  W --> DB
  W --> FS
  FE -->|poll status/read data| API
Loading

Why this architecture

  • OCR/LLM tasks are variable-latency and CPU-intensive; keeping them async prevents API blocking.
  • API stays responsive while worker processes documents in background.
  • Postgres gives consistent state for records and reconciliation outputs.
  • Redis decouples ingestion from processing for reliability and scale.

5) Sequential workflow (end-to-end)

  1. User creates a cotton inward record from dashboard.
  2. User uploads invoice and/or weighbridge documents.
  3. Backend validates type/size and stores file under uploads.
  4. Backend creates Document row and queues Celery task.
  5. Worker picks task and runs OCR extraction.
  6. Parser converts OCR text to structured fields (LLM first, regex fallback).
  7. Parsed values update Document and linked CottonInward.
  8. Reconciliation service calculates shortage/debit metrics.
  9. Inward status updates (e.g., processing/reconciled/debit required).
  10. Frontend displays results for review and action.

6) Repository structure

  • backend/ → FastAPI, models, schemas, OCR/parser/reconciliation services, Celery worker.
  • frontend/ → React dashboard and review UI.
  • uploads/ → stored uploaded files (invoice/weighbridge).
  • docker-compose.yml → local orchestration for all services.
  • .env.example → environment template.

7) API overview

Base URL: http://localhost:8000/api/v1

Cotton inward

  • GET /cotton-inwards → list with pagination/filter/search
  • POST /cotton-inwards → create record
  • GET /cotton-inwards/{id} → details
  • PATCH /cotton-inwards/{id} → update record
  • DELETE /cotton-inwards/{id} → delete
  • POST /cotton-inwards/{id}/reconcile → manual reconcile

Documents

  • POST /documents/upload → upload invoice/weighbridge
  • GET /documents/inward/{cotton_inward_id} → list inward documents
  • GET /documents/{document_id} → document metadata
  • GET /documents/{document_id}/status → OCR processing status
  • GET /documents/{document_id}/download → download file
  • DELETE /documents/{document_id} → remove document/file

Health/docs:

  • GET /health
  • GET /docs

8) User guide (how to use)

A. Start the platform (Docker, recommended)

  1. Go to project root:
    • cd /home/hariswar/Documents/TextFinOps
  2. Ensure .env exists (copy from .env.example if needed).
  3. Start services:
    • docker compose up -d --build
  4. Open:
    • Frontend: http://localhost:5173
    • API docs: http://localhost:8000/docs

B. Daily usage flow

  1. Open dashboard and create an inward entry.
  2. Upload invoice and weighbridge files.
  3. Wait for processing status to complete.
  4. Open inward detail page and review extracted values.
  5. Verify reconciliation output (shortage, allowable loss, debit qty/amount).
  6. Update/override received weight if needed and trigger reconcile.
  7. Download source documents if required for audit.

C. Stop services

  • docker compose down

D. View logs when troubleshooting

  • docker compose logs --tail=200 web
  • docker compose logs --tail=200 worker
  • docker compose logs --tail=200 frontend

9) Configuration

Important environment variables:

  • DATABASE_URL → async Postgres URL used by backend/worker.
  • REDIS_URL / CELERY_BROKER_URL / CELERY_RESULT_BACKEND → queue infra.
  • UPLOAD_DIR / MAX_UPLOAD_SIZE_MB → document storage rules.
  • GOOGLE_API_KEY / OPENAI_API_KEY / ANTHROPIC_API_KEY → optional LLM provider key.

Use only one provider key unless your parser logic explicitly supports fallback order.


10) Security and operational notes

  • Do not commit real API keys in .env.
  • Prefer secrets management in production.
  • Restrict CORS origins for non-dev environments.
  • Add authentication/authorization before production usage.
  • Add rate limits and file scanning for upload hardening.

11) Current limitations and next steps

  • Limited document templates may need parser tuning.
  • OCR quality depends on scan quality.
  • No auth yet (suitable for controlled internal environments only).
  • Consider adding:
    • role-based access control,
    • retry/monitoring dashboards for Celery,
    • model/parse confidence review queue,
    • automated test coverage for parsing/reconciliation.

12) Quick verification checklist

After startup, verify:

  • http://localhost:8000/health returns healthy JSON.
  • http://localhost:8000/docs loads API docs.
  • http://localhost:5173 loads dashboard.
  • Uploading one sample file creates a document and processing status endpoint responds.

If all pass, TexFinOps is running correctly.

About

AI-powered cotton procurement reconciliation platform using OCR + LLM parsing + async workflows to automate invoice/weighbridge matching and shortage/debit calculations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors